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\  ABSTRACT 

\  a  °  S’ 

^  This  thesis  consists  of  two  parts.  First,  a  condition  number/  for  the  exponential  of  a  tri¬ 
angular  matrix  S  is  introduced.  It  measures  how  sensitive  is  fr  to  relatively  small  perturba¬ 
tions  in  the  elements  of  S.  Secpbd,  a  new  technique  (matrix  argument  reduction)  for  compute 
ins  periodic  matrix  functioqd'  -scribed  and  discussed  in  detaiL  By  applying  this  technique 
to  the  computation  of  r^one  can  always  reduce  the  problem  to  one  m  which  the  eigenvalues 


lie  dose  to  the  real  axis. 
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INTRODUCTION 


The  exponential  matrix  plays  a  role  in  several  fields  of  mathematics 
and  applied  mathematics,  particularly  control  theory.  The  possibility  of 
computing  was  considered  as  soon  as  computers  became  available. 
Nevertheless  the  computation  has  proved  harder  than  might  be  imagined  for 
such  a  well  behaved  function  as  exp  I  In  1976  Holer  and  Van  Loan  published  a 
paper*  with  the  title 

"Nineteen  dubious  ways  to  compute  the  exponential  of  a  matrix.” 
Among  the  approaches  mentioned  in  the  Moler-Van  Loan  article  is  the  use  of 
the  Schur  form.  This  seemed  to  promise  a  stable  computation.  Nevertheless 
there  remained  one  weakness  in  this  approach. 

,  .  {Si i  $,.] 

Let  5*1  q  o  I  be  tbs  Schur  form.  Clearly,  if  S n  and  5as  have  com¬ 
mon  eigenvalues  than  5  should  not  be  split  up  to  facilitate  the  calculation. 
Lose  obvious  is  the  fact  that  accuracy  may  also  be  lost  when  Su  and  Sa, 
though  *ar  apart,  have  common  eigenvalues  in  their  exponentials.  This  has 
bean  he  stumbling  block,  and  the  only  stumbling  block,  to  the  success  of 
Schur  form  techniques. 

One  of  the  contributions  of  this  thesis  is  to  remove  this  obstacle  by  a 
new  technique  that  we  call  matrix  argument  reduction.  The  problem  is 
thereby  reduced  to  one  in  which  the  eigenvalues  lie  close  the  real  aids.  Con¬ 
sequently  exp(Su)  »  exp(5gt)  <=>  Sn  «  Sa.  Now  we  have  a  stable  pro¬ 
cedure  for  computing  exponentials  of  arbitrary  square  complex  matrices. 
Just  as  the  Singular  Values  Decomposition  (SVD)  is  not  always  the  method  of 
ohoice  for  Least  Square  problems,  our  procedure  may  not  always  be 

*  Teohsioal  report  79-283.  Dept.  of  Computer  Science,  Cornell  University,  I  theca,  Hew  York. 
This  appears  In  SIAM  Review. 


preferred  for  exponentiating  matrices  tA.  Nevertheless,  even  if  it  is  not  the 
fastest  technique,  our  approach  furnishes  a  reliable  procedure  for  tackling 
even  the  most  difficult  cases. 


Part  I. 


A  Measure  of  The  Sensitivity  of 
the  Exponential  of  Triangular  Matrices 
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1.  Introduction 


The  sensitivity  of  etA  to  small  changes  in  A  is  an  important  topic  in  the 
analysis  of  algorithms  for  solving  linear  systems  of  ODEs  with  constant 
coefficients.  In  finite  precision  arithmetic,  one  cannot  expect  to  do  better 
than  to  approximate  the  exponential  of  a  slightly  wrong  matrix  A+6A .  Con¬ 
sequently,  several  researchers  have  obtained  bounds  on 


'  1  ||eM|| 


(1.1) 


where  the  matrix  A+6A  is  a  small  perturbation  of  A.  See  [3],  [7]  and  [10]. 


Ve  cannot  improve  on  the  bounds  given  in  the  above  papers  for  a  gen¬ 
eral  matrix  A.  Our  work  stems  from  the  realization  that  these  days  a  prelim¬ 
inary  reduction  of  A  to  Schur  form  S  is  a  routine  matter  with  moderate  cost. 
Moreover,  having  reduced  A  to  a  triangular  S,  it  is  possible  (see  section  3.1) 
to  compute  a3  in  floating  point  arithmetic  to  yield  the  exponential  of  some 
S+6S  where  SS  satisfies  |&S|£ Constant -e- |S|  (element-wise).  Here  e  is  the 
arithmetic  precision.  Consequently,  it  suffices  to  examine  the  sensitivity  of 
a3  with  respect  to  small  relative  changes  in  5. 

In  practice,  the  computation  of  the  Schur  form  of  A  will  yield  the  exact 
Schur  form  of  some  A+SA  with  a  very  small  6 A  (A+6A  =  CJSUH ,  where  U  is 
unitary).  Our  results  give  a  realistic  estimate  for  the  difference 
||/l(es)  —  UireA*tAU ||.  Of  course,  whether  eA*tA  is  close  to  eA  depends  on 
both  A  and  SA.  This  question  is  not  fully  resolved.  As  a  preliminary  step. 
Appendix  I. A  offers  an  integral  representation  of  eA  —  eA*tA\  but  it  is  too 
complicated  to  allow  a  realistic  error  bound  in  terms  of  the  computed  Schur 
factors. 
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The  main  result  of  the  following  sections  is  a  bound  (Theorem  3.4)  on  the 
spectral  norm  of  the  matrix  COND(es;  i,j),  where  the  (p.g)  element  is 
defined  by 

[COND(es; =  *P<f  dSp,q^  " 

This  element  is  a  measure  of  the  change  in  the  (i  j)  element  of  es  to  a  small 
relative  change  in  sp  q.  To  condense  all  this  information  to  a  single  number, 
we  propose  the  condition  number  of  S  for  exponentiation  to  be 

cond(S;exp)  =  . 

where  D  is  diagonal  and  N  is  strictly  upper  triangular  and  such  that  S=D+N . 
An  alternative  name  is  the  exponential  condition  number  of  S. 


2.  A  Representation  of  Functions  of  Triangular 
Matrices 

2. 1.  Notations 

Let  Z  denote  the  abscissae  Z=(f lffg,  .  .  .  ,{n).  We  follow  [5]  and  use  A£{Z)f 
for  the  £-th  divided  difference  of  /  on  (&,&+],  ....  &+*). 

When  Z  has  exactly  A:  +  l  elements,  we  suppress  the  subscript  and  use 
A k(Z)f  to  denote  the  highest  order  (fc-th)  divided  difference  of  /  on  Z . 

For  a  matrix  A ,  |i4|  denotes  the  matrix  all  of  whose  elements  are  the 
absolute  values  of  the  elements  of  A$  i.e.,  141^=14^1*  The  notation  A^B 
means  that  Aij^Bij  for  every  i  and  j. 

Finally,  let  Efj  be  the  set  of  multi-indices  \o={ot,Oi,  .  .  .  ,  <J*),  where  all 
or 's  are  integer  and  i=ao<o,|<"  <o*  =j  j.  For  examples, 

Eij=  . and  so  on. 

Notice  that  if  a€£fj,  then  i+l  ss  Oi  £  j  -(k  - 1 )  for  1=1,2 . k—  1. 

2.2.  A  Representation  of  f  (S)  in  Terms  of  Divided  Differences 

Given  /  an  analytic  function  and  S=(st  j)  an  upper  triangular  matrix  ,  every 
element  of  f(S)  can  be  written  in  terms  of  the  exponential  divided 
differences  on  eigenvalues  of  S.  This  representation  (Theorem  1)  can  be 
found  in  Van  Loan  [9]  (see  also  [6]  and  [8]).  Here  we  give  a  different  proof 
which  is  simpler  than  the  one  in  [9]. 

For  simplicity,  let  {  denote  an  eigenvalue  of  S,  i.e.,  { i=siii ,  i=l,2 . n. 

According  to  [2],  when  /  is  a  holomorphic  function  defined  inside  and  on  a 


simple  closed  contour  C  in  the  complex  plane  (positively  oriented),  then  the 
divided  difference  A i(Z)f  has  the  following  representation; 


hk(y\f-  1  f _ f(o)do _ 

)J  2m  •£  («-fc)(u-^m)  ”(«-&♦*) 


(2.2.1) 


Another  useful  representation  of  the  divided  difference  is  Hermite-Genocchi 


formula  (cf.  [5]) 


i  "j  "i-i 


*t(Z)f=J£J£~  Jf  (2.2.2) 

where  denotes  the  fc-th  derivative  of  /.  This  formula  will  be  used  in 
later  sections. 

The  key  to  the  result  is  a  formula  for  elements  of  the  resolvent  of  5. 


Lemma  2.1.  Given  z  not  equal  to  any  of  the  eigenvalues  of  S.  the  resol¬ 
vent  of  S,  X=(zI~S)~l,  has  the  following  representation: 


0 

1 

(*-Ci) 


if  j<i, 
if  i  =j . 


y  S<W  ***-.«> _  dj>i 


(2  2.3) 


Proof.  Let  X=(xij)  be  defined  according  to  (2.2.3)  and  let 
C=(cij)-X  (zf—S).  We  want  to  show  that  C  is  the  identity  matrix.  Since  S  is 

upper  triangular,  it  is  obvious  that  c<j=0  for  j  <i  and  cw*l  for  i=l . n.  To 

show  that  Cij =0  for  j  >i,  note  that  for  j  >i. 


cij  =  „(*/— 5)mj 

m*t 

=  *tj(z-f/)  +  Zi.m  i-Smj)  ■ 


I 
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It  suffices  to  establish  x^z  -£*)=  ;£  .  From  section  2.1,  if  oeEfj. 

m=i 

then  a fc=j  and  a0=i.  By  the  definition  of  we  have 

*j<*-<i>  =  £  £  r*  , 

*=»ae£fj  'Z  *V  l*  GW 

_  1  x  ^  V  Sgo-gi  "Sg>-8-g*-i 

<«-<■.„)  ■  •*-■"* 

Set  m.=a*-i.  Since  j—lS:crjfe-i==m^i+fc--l  (see  section  2.1),  the  right  hand  side 
becomes 

1  .  ^  V  /  V  Sg0gl  '  Sg*-gm 

(Z  _Ci)  Jt=2  I7»=i+fc-l  m  i 

interchanging  5]rs,  we  get 

_  .  ,  ^  /m^+1  v»  *m _ x  0 

Zi.i'SiJ  +  Zi  V  Zj  Zj  /2_>  \/2_>  ^  'Srn-i  * 

m=i+l  Jk=2  '«*-*'* 

By  the  definition  of  xitffv,  the  above  expression  is  equal  to 


m=i+l 


This  proves  the  lemma.  ■ 


Let  C  be  a  simple  closed  (positively  oriented)  contour  that  encloses  the 
eigenvalues  of  5.  It  cam  be  shown  (see  [2])  that 

/  (S)=  (z  )(z/-S)-ldz.  (2.2.4) 

Theorem  2.2.  Given  S  upper  triangular  with  eigenvalues  &=«*,*  and  / 
analytic  on  some  region  containing  f  l.fe . fn.  we  have 


-  amm 
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0  ifjr<i, 

/(ft)  i 

2  S*0.»,'S  »,.**■  ■■*»*_!  .9k'^k{.Zq)f  ifj>i 


where 


(2.2.5) 


Proof,  From  Lemma  2.1  and  (2.2.4),  we  have 


f(s) iJ  =  g 

where  (xij)=X=(zf—S)~1.  Consequently  f(S)ij=  0  when  j  <i  and 
f(S)i.i=f(Ci).  For />i. 


°.U4*v'' "•'-'■‘w-fc— 


f  ^)(*  — f »|)"  ■  (*  ~(*j) 


■) 


~wJ?*  Jvi  S’t-irk ' A*  (Z^/- 

“H 


This  proves  the  theorem. 


The  theorem  has  some  interesting  consequence. 

Corollary  2.3.  Let  S=D+N  where  D  is  diagonal  and  JV  is  strictly  upper 
triangular.  For^^i  and  f&O, 

l/(*S)tjl  *  (•l|jr,kT*jafl/t,“<,(*OI.  (2.2.8) 

where  Ji  is  the  convex  hull  of  ft .....  ft . 

Proof.  We  first  prove  that  (cf.  [6]).  for  k  >0, 
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f(S)ij  = 


where 


0 

f« i) 

S  E>, 

*»»  Otjfj 

■  •  fa*)- 


if  i=j. 

'®Mi '  ^»)/  itj>i 


(2.2.5) 


Proof.  From  Lemma  2.1  and  (2.2.4),  we  have 

/(s)ij  =  £rfcf(*>xudz. 

where  (xij)=X=(z/—S)~l.  Consequently  f(S)ij=  0  when  j«  and 
/(5)i.<=/(fi).  For;*, 


=tf  v  *  «  /  1  /*  f  (ar)cte  \ 

vr  S^V'a, i  4  1 


E  Vr  • 

*’Wj 


This  proves  the  theorem.  ■ 


The  theorem  has  some  interesting  consequence. 

Corollary  2.3.  Let  S=D+N  where  D  is  diagonal  and  N  is  strictly  upper 
triangular.  For^ii  and  fiO, 

l/<«s)ul  (2.2.6) 

where  0  is  the  convex  hull  of(i . (y. 

Proof.  We  first  prove  that  (cf.  [6]).  for  k  >0. 
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(N*)iJ  = 


£  ®Vi  s,i.*i  s,n-**  if  j-iifc, 


otherwise  . 


It  is  obvious  that  (2.2.7)  is  true  when  k=l  and  that  (JV*)ij=< 
j  <i+Jfc.  Assume  it  is  true  for  some  fc^l,  then  for  jii+fc  +  1 

=  H  (Nk)ijn  NmJ 

m*i+k 

=  £  *V|’ Vrt  **»j  • 

By  the  principle  of  induction,  (2.2.7)  is  true  for  all  Jfc. 

Now  insert  the  inequality  (see  [5],  section  1) 

|A*(2,)/ 1«  j~migc  l/^HOI.  foraU  <r€£fJt 

into  Theorem  2.2.  For  j  >i , 

l/(*5)ui  m«  |/W(«)I 

where 

J*u(0*jb( jfeT'  E  I ” ■*•*-».•*!)' ** 

*■»  •‘ffj 

Because  of  (2.2.7),  the  polynomial  in  (2.2.9)  can  be  written  as 

k*l 


vi»i> 


(2.2.7) 

whenever 


(£2.8) 

(2.2.9) 

(2.2.10) 


When  i=j,  both  sides  of  (2  uqual;  therefore,  together  with  (2.2.8)  and 


(2.2.10),  we  have  for 


\f(tS)ij\ *(•*'"')„  mgt  |/^)(()|.  - 
When  /  is  specialized  to  exp  we  obtain  the  result  we  need  in  the  sequel. 


Corollary  2.4.  Given  S  =D+N  as  in  Corollary  2.3.  we  have 

|«^|  ^  (2.2.11) 


Proof.  Applying  Theorem  2.2  to  /  =exp  we  get  for  j  >i 

l(Oijl*£  E  |s-^1"*-*_I.v*MAfc(^)exp|. 

Since 


|  A*  (Z)exp  |  *  Afc  (ReZ)exp  (2.2. 12) 

(the  reader  can  prove  it  directly  from  (2.2.2),  or  see  [5]),  we  have 

\*s\u<i:  s  i ®vo.*ti *■  ■!*»*.»* i  ■A*(Re£ff)exp 
=  (*<*(/»♦  |*l))u.  . 


Remark.  Corollary  2.3  is  not  new.  It  may  be  found  in  [9],  though  the  proof  is 
different.  Corollary  2.4  is  a  sharpening  of  a  result  in  [9],  which  asserts 

where  a(S)=maxRe(V  Corollary  2.4  may  be  proved  by  using  certain 
differential  inequalities,  see  Appendix  I.B. 
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3.  A  Condition  Number  for  The  Triangular  Matrix 
Exponential 

3L1.  The  Sensitivity  of  (e*)|j 

Let  us  define,  the  matrix  r(S)=Re(Z?)+|.Af  |  where  D  is  diagonal  and  N  is 
strictly  upper  triangular  such  that  S=D+N.  We  shall  examine  the  relation 
between  the  quantity  (er^)«j  and  the  change  in  (e5)<j  under  a  small  rela¬ 
tive  perturbation  in  S. 


Given  any  function  g(x)  and  a  small  relative  perturbation  (l+6)x  of  x,  a 
good  measure  of  the  rate  of  change  in  g(x)  with  respect  to  x  relatively  is 


^g(x(lti))  g(x)  _  x.g,(xy  (3.1.2) 

If  we  consider  (e5)tJ  as  a  function  of  . Sjj  ,  then  the 

corresponding  measure  of  the  rate  of  change  in  (es)ij  with  respect  to  a  par¬ 
ticular  element  sp  g  (i^p.q^j)  is 


Ha 


6Ha 


We  may  call  the  above  measure  the  sensitivity  of  (es)tj  with  respect  to  sp.q- 
Associated  with  each  (es)tj  is  a  matrix  of  sensitivities, 


COND((e*)tJ)  = 


®i.r 


9*i.t 


*i.a 

Sgjf 


0(es\j 

9®  1.2 


9®  8.2 


®1.» 


9®i.» 


9(®5)<j 

9®».i» 


Our  objective  is  to  find  a  bound  on  ||COND((es)tj)||  (see  Theorem  3.4). 
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We  shall  use  the  following  notations. 

i_ 

||5||  ■  (max.  eigenvalue  of  SHS)2  (spectral  norm), 

||5||,  =  max2l*«jl  ( 1-norm), 

||S IU  “  1 I  (°°  -  norm). 

i  i 

GxH  *fo,h):g€G.h<zH\  . 

Recall  E?j  is  the  set  of  multi-indices  {a:  a=(ffo.CTi . a*).  where 

Let  Eij  =  \JE?j  .  Also  recall  Za=({ag . £«*)  where 

tk  =  skjt'  fc=l,...,n.  By  Theorem  2.2  (e5)<j  has  the  following  representation 

(«5)« J  =  S  E  *'0.o1'  sot-l.'t  A*(Z<,)exp.  (3.1.4) 

*-»  rtgfj 

Note  that  (s1^)*  j  has  the  similar  representation 

(•r(5))ij=S  E  Isco.*,  I  ”  I  I  A*(ReZ»)exp. 


We  begin  by  proving  the  following  two  lemmas. 

Lemma  3.1.  For  any  integer  m  between  i  and  j  — 1.  let  R— 
[4,m]x[m.+lj],  Le.,  (p.g)e/?  if  and  only  if  m  +  1 Then 


l*(«r<S))<j- 


(3.1.5) 


Proof.  Let  Xp,q  denote  all  o  in  Eij  such  that  for  some  l.  a*  =  p  and  Ol+i  -  q. 
Since  m  lies  between  i  and^—1,  for  any  o  in  there  must  be  some  Oj  and 
Oi+i  such  that  oi  <  m  <  0|*j.  Thus 


Bij  =  U  Xp.i' 
&>.*)«* 


However,  it  is  not  difficult  to  see  that  for  (p,q ),  (r.s)ej?. 


Xp.q^Xrs  ~4>  U  (P .9 )*(*••* )• 


So,  Eij  is  a  disjoint  union  of  Xp  q .  (p,q)zR.  Since  for  a€£ij 

s*o*i  if  a^-Xr* 


Sr*  dSfjf  (*•**»"  s°h-v°k  ^k  - 

we  hove,  using  the  representation  of  (e5)t  j. 

v  L  I-  ^  I-  a 


otherwise. 


(p*) 


= cIhw  av. 


Therefore,  by  (2.2. 12) 


E  lSP-»  ‘  a_  ^“1  S  2  !*•»*! I ' I  ’^(ReZ,) 

ftj^j 

=(«r{5))<J.  ■ 


Lemma  3.2.  Recall  fm=  sm  m  is  the  eigenvalue  of  5.  1  asm. in.  Then 

£  * m„|f, |  (.««), J. 

To  prove  this  lemma,  we  need  the  following  key  result  which  brings  in  Reft  in 
place  of  ft. 

Lemma  3.3.  For  any  ffG£f/ 

2  I  *  A*(ReZ#). 

MBl 


Proof  of  Lemma  3.3.  For  simplicity,  we  write  A*  for  A*(Z0)  whenever  c  is 
fixed.  From  the  Hermite-Genocchi  formula  (2.2.2),  we  have 


2 


i  •’i  •'•-i 


A*  =  //-  /  exp[fa0+(f(,I-<-O9)i'1+  -+(f.4-fCt_1)^]ci^-  di'i. 

0  0  0 

Take  the  partial  derivative  of  A*  with  respect  to  ^  and  set  i/0=l.  i/*+I=0  to 


find  that 


a&T’  =  {  {  ~Vp ]dv*"  dvi- 

Note  that  lii/jiti/gi  so  (vp-vp+i )>0  and  £}  (vp -i/p4.,)=l.  Hence 

Pm o 


£  Jr—  *  f  f  dvk  -  dvx 

P"0  °\lp  0  0  p  /-l  J  * 

1  ‘V-‘  * 

=  /  ••  /  e*p[Ref##+£(R«f«i“Re^-|),//](1,/*^ ”■ dvi 


0  0 


=  A*(ReZ.). 


0A*  _ 


To  complete  the  proof,  notice  that  — —  =  0  for  <*»**&..  O^p^A:,  so 

0{m  * 

£  I  ST- 1  *  £  I  £-#(Z,)\  *  A‘(ReZ#).  . 
mr  1  fls»»  >«0  ®C, 


Proof  of  Lemma  3.2.  Using  the  representation  of  (o5)tj  (3.1.4),  we  have 

M«1  °Sm  pi«i  °Sm 


•£rA*(z,)l 


By  Lemma  3.3,  it  ia  less  than 


<p  I *El I* I  'I I  ’^(W)  • 


f 


Lemma  3.1  and  Lemma  3.2  together  yield  the  following  theorem . 

Theorem  3.4.  The  sensitivity  matrix  COND((es)tj)  of  each  element 
(eS)ij  of  es  satisfies 


||C0ND((e5)tj)||  *  (1  +  max|fc|K«|W)u. 


Proof.  Since 


g(«5)i j  _ 


=  0  for  p<i  or  q>j. 


||COND((es)ij)||,  =  max  2 1  *, *.—**- 

l«m*n  ustjn 


iu, + .  s  Jv,-^ 


»■*  (p.q)fjt 

where  /? =[i,rn  ]x[m +• i,j  ].  Hence,  Lemma  3.1  and  Lemma  3.2  imply 
||COND((e*)u)||1*  (1  +  max| fc  |)(e™)tJ. 


Similarly 


Since 


||COND((es)„)||..*  (1  +  maxk1|)  (er^)u. 


||*||8  =  max.  eigenvalue  of  ( BHB ) 

*11**11,11*11, 

=  ||*||.-||*|||. 


the  theorem  follows.  ■ 


(3.1.6) 


The  theorem  and  lemmas  in  this  section  suggest  that  (e1^)^  is  the 
essential  quantity  that  measures  the  norm  of  the  matrix  COND((e5)tj).  In 
fact,  if  we  allow  no  perturbation  in  the  diagonals  (ys  (i.e.,  replacing  the  diag¬ 
onal  of  COND((s,)i.)  by  zeros),  then  ||COND((e5)1J)||  * («rW)iJ.  The  factor 
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max |  |  in  (3.1.6)  comes  only  from  the  effect  of  small  relative  changes  on 

the  diagonal.  This  factor  can  be  reduced  considerably.  For  example,  con¬ 
sider  the  one  dimensional  case:  compute  e~1000.  The  logarithmic  derivative 
of  e  “,00°  is 


-1000  „ 
c-iooo  8 


-i°oo  _  _10oo. 


which  indicates  correctly  that  one  ulp’s  (unit  in  the  last  place)  change  in  the 
argument  -1000  will  result  in  a  thousand  ulp’s  change  in  e-1000.  However, 
modern  implementations  of  exp  have  taken  the  trouble  to  do  better  than  get¬ 
ting  e**14'*). 

As  a  matter  of  fact,  in  [5]  we  exhibit  an  algorithm  for  computing  the 
exponential  divided  differences  with  guaranteed  accuracy 

|/I(Af(Z)exp)  -  Af(Z)exp)  <  e-A*  Af(ReZ)exp. 


where  e  is  the  precision  of  the  arithmetic  and  A*  depends  on  k  only.  Thus,  a 
typical  error  analysis  will  show  that  if  one  computes  es  by  formula  (3.1.4), 
then  for  some  AP/_t  depends  only  on  the  difference  j  — i, 

I  fK(*S)ij)  -  («5)u  I  *  '  (3-1.7) 


Conclusion.  In  view  of  the  above  discussion,  we  define  the  absolute  sensi¬ 
tivity  of  ( es)tj  to  be  (er^)«j  (ignore  the  factor  (l+maxl^  | ));  and  the  rela¬ 
tive  sensitivity  (relative  to  (®s)ij)  to  be  Pij-  fv— $.^*4-  .  provided  (e5)tj  *0. 

I V®  JiJ  I 


3.2.  Perturbation  Bounds 


Since  the  matrix  exponential  problem  is  closely  related  to  the  solution  of 
linear  systems  of  O.D.E. ’s  one  might  suspect  that  similar  results  concerning 
the  sensitivity  of  would  have  been  in  the  O.D.E.  literature  .  For  example, 
a  simple  corollary  of  the  following  integral  representation  (cf.  Bellman  [1]) 
yields  perturbation  bounds  similar  to  our  results  in  Theorem  3.4. 

Theorem  3.5.  Let  A  and  B  be  square  matrices. 

.*(<*♦*)  _  t*A  =  f  (B)  e«A+B>dT  (3.2.1) 

0 

Proof.  Let  X(t)  be  the  left  hand  side  of  (3.2.1).  It  is  straightforward  to  see 
that  X(t)  satisfies  the  O.D.E. 

^ T(f)]  a  «-**  B  eu. 
at 

The  theorem  follows  by  integrating  the  above  equation  from  0  to  t.  ■ 

Let  6S  be  a  triangular  perturbation  of  an  upper  triangular  matrix  5. 
Corollary  3.8.  If  )d5js;e|,?jt,  then,  as  e-»0, 

| es*as  -es|s  e- (max | 1 1 •/  +  |-r(S))  e*5*  +  0(e2)  (3.2.2) 

t  w 

Proof.  (3.2.1)  implies,  as  e-*0, 

t 

at(S+iS)  _  ats  -  erSdr  +  ^(s8). 

o 

Note  that  P (S)  dominates  |  S  |  except  on  the  diagonal.  In  fact 

|4S|<  e-|5|  <  e(r(S)+  2max|Si#i|-/). 

f  Recall  |  denotes  the  matrix  whose  elements  ere  the  absolute  values  of  the  elements  of 

A. 


I 
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Since  |e5|  :S  e^S)  (Corollary  2.4)  and  ~e^s^r(S),  we  get 

I  gS^S  _  gS  |  £  y8(l-r)r (S).  |  ^ 1  .erFXJ)dT  + 

0 

1 

^ ye(I_T)T(S).e'rfXS)(r(1s)+2max| st<t  |  I)dT  +  ©(e8) 

Q  * 

^•(maxl^.t  |  •/ +  5-lX<S))-er^  +  0(e2).  ■ 

i  C 

We  should  mention  that  Corollary  2.4,  namely,  |es|  can  be 

derived  without  using  the  explicit  representation  of  es  given  in  Theorem  2.2. 
In  fact,  it  is  just  a  corollary  of  the  following  typical  differential  inequality  in 
O.D.E.  . 

Theorem  3.7.  If^(0)&  |?9(0)|  and^'(f)^  |p*(f)|  (f^O).  then 

**)*?(*)  (f*0).  (3.2.3) 

Proof.  It  follows  from  integrating  if'  and  from  0  to  t.  ■ 

Corollary  3.8. 

|«»|  sSe1™. 

The  proof  is  given  in  Appendix  I.B. 

It  is  possible  that  someone  has  already  obtained  results  similar  to 
(3.2.2)  in  works  related  to  O.D.E.  ;  but  our  search  of  the  literature  has  not 
revealed  it.  To  compare  the  result  of  our  approach  (using  the  explicit 
representation  of  (e5)*j)  with  Corollary  3.6,  we  give  a  perturbation  bound 
derived  from  Theorem  2.2.  Recall  (es)ij  has  the  following  representation 
(for  simplicity,  we  drop  the  reference  of  exp  in  Afc(Zc)exp) 

(®5)ij  =  2  'A*CZ<r)  (3.2.4) 

*■*  ^ 


I 

iMMfclfl— H— iiMi  fra  i  Tii  nifrilillilU^lMMIfrfri^ii  r  nr 
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Theorem  3.9.  If  |  SS  |  ^  e  1 5 1 ,  then,  as  e-»0, 

l(e5t<5  -  es)ij  |  £  e(j^+max|s<i<|)(er(5))<J  +  0(c?)  (3.2.5) 

Proof.  A  typical  component  in  the  right  hand  side  of  (3.2.4)  is  of  the  form 

0la2  *..<VA*P0  (3.2.6) 

where  •  •••(*]  is  a  subset  of  the  eigenvalues  of  S .  The  correspond¬ 

ing  component  in  the  perturbed  element  (es+6s)t  j  is 

(al+dai)'(a2+<5£i2)*"(ttfc  +da^)*  A*  (X +dJf)  (3.2.7) 

where  |  dc^  |  |  a*  |  and  |  df <  |  |  £*  | .  Since  A*  is  a  smooth  function  of  each  of 

its  argument,  we  can  expand  A*(X+dX)  at  X  to  get 

A  k(X+6X)  =  tk(X)  +  dA*(X)  +  0(e2)  (3.2.8) 

where 

uk(x)  =  £ti~*k(n 

tsO  °S< 

It  follows  from  Lemma  3.3  that 

| d  A* (X) |  <:  max |  £t  |  Afc(Re*). 

Now,  subtract  (3.2.6)  from  (3.2.7)  to  get 

ftfo+do^Jr+d*)  -  flch-itiX)  = 

|i»l  i  =  l 

!B|ft“|-[ft(l  +  ^r-)*(AfcW+«AfcW)  -A*(*)]  +  0(e2) 

|i=i  <*i  ®i 

l1^  I  •Afc(Re^f)[(l+e)fc— l+emax|  £*  |  ]  +  0(e2) 

is|  ' 

as]*[|at|AA!(ReAr)(£fc+emax|(i  |)  +  0(t2). 

<sl  * 


r 
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As  a  consequence,  we  have,  as  e  -*  0, 

^  e(j -i  +max | «i,i | ) -^  S  ls<ro.«,l  "  I***.,.**  I -Afc(ReZ„)  +  0(e2) 
k=l  <*£tj 

^  e(J-i+maxlsi4if)  (e^)ij  +  0(ez).  • 

The  reader  should  notice  that  the  perturbation  bounds  (3.2.1)  and 
(3.2.5)  are  similar  but  different  It  is  not  clear  which  one  is  sharper  in  gen¬ 
eral. 


3.3.  An  Exponential  Condition  Number 

Recall  that  r(S)=Re(2?)+|  JV|  where  S^D+N,  D  is  diagonal  and  N  is  strictly 
upper  triangular.  In  section  3.1  we  defined 


Pij(s)  = 


l(eS)tjl 


(3.3. 1) 


to  be  the  relative  sensitivity  of  (es)ij%  In  case  of  (e5)i j  =0  and  j  /0, 

ire  set  The  reason  is  simple  :  when  (es)ij=0,  a  tiny  perturbation  in  S 

will  turn  into  a  non-zero  element  and  hence  the  relative  change  of 

(es)ij  is  ~.  When  (er(s))tj=0,  we  dclne  p<j=l.  For.  it  can  be  shown  that 
from  the  representation  of  (e*V))t  j,  j  =0  implies  S  is  reducible  :  some 

integer  m  exists  such  that  sp  g  is  equal  to  zero  for  i&p^m  and  m  +  l£q  <j  ; 
Le.,  the  sub-matrix 


®t.i  *«.<+ 1  •  Sij 

siJ 


S,  0 
S2 


f 
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In  this  situation,  only  perturbations  in  5,  and  52  are  considered  and  (e5)tJ 
remains  zero. 


The  measure  pij  is  quite  realistic,  in  the  sense  that  the  relative  error  in 
the  computed  (e5)tj  (by  some  some  of  our  best  numerical  method)  agrees 
with  Pij  in  magnitude.  Let  us  consider  the  following  example. 


5  = 


1  3  2.7727 

0  —2 
~1 


The  exponential  of  5  and  T\5)  are  (correct  to  5  significant  decimal  digits) 


2.7183  5.1548  -3.4592, 0-6 

2.7183  5.1548  6.5190 

II 

m 

1  -1.2642 

ens)- 

1  1.2642 

1 

0.36788 

0.36788 

The  sensitivity  of  the  (1,3)  element  is 


6.5190 

Pl,3_  |-3.459210-6| 


«  I.88106. 


which  indicates  that  (es)1>3  is  sensitive  to  rounding  error.  For  instance,  if  one 
uses  Parlett’s  recurrence  ([5])  using  5  significant  decimal  digits  arithmetic, 
one  obtains 


/l(«5>= 


2.7183  5.1549  4.7670,0-6 
1  -1.2642 

0.36788 


Notice  that  not  even  one  digit  is  correct  in  the  (1,3)  element.  This  is 
expected  since  a  perturbation  of  2.7727  to  2.772707  will  yield  an  exponential 
equal  to  the  above  fl(es)  up  to  five  significant  decimal  digits. 

In  the  above  example,  the  (1.3)  element  is  so  small  that  it  is  negligible 
compared  to  the  other  elements.  Therefore,  the  inaccuracy  of  that  element 
will  not  have  much  effect  on  the  whole  matrix.  A  reasonable  single  measure 
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for  the  matrix  as  a  whole  is  the  following  : 

Definition.  The  condition  number  for  the  exponential  of  a  triangular 
matrix  S  with  respect  to  a  given  norm  is  defined  to  be 

cond(S)=cond(S,exp)=  .  (3.3.2) 

II®  II 

In  the  above  example,  cond(S)  »  1.8. 


Remark  1.  In  practice  we  are  only  interested  in  corufs  order  of  magnitude. 
Numerical  examples  show  that  the  number  of  decimal  digits  lost  in  fl(es)  is 
visually  smaller  than  logioConcf(S)  .  Since  there  is  no  need  to  compute 
concf(S)  accurately,  a  fast  method  can  be  employed  to  compute  for 

cond(S)  (notice  that  T(5)  is  real).  The  cost  of  computing  e1^  adds  about 
20%  to  the  cost  of  the  Schur  form  and  the  cost  of  es. 


Remark  2.  We  have  performed  numerous  experiments  on  various  matrices 
based  on  the  numerical  algorithm  (fcr  the  matrix  exponential)  suggested  in 
[5].  We  compared  the  relative  error  in  namely 

!!/»«*> -«*ll 
11**11 

•with  t-cond(S)  (where  t  is  the  machine  precision)  and  found  that  cond(S)  is 
a  good  predictor  of  the  error  in  magnitude,  provided  that  the  matrix  S  is 
first  reduced  to  an  equivalent  matrix  that  has  eigenvalues  bounded  in  their 
imaginary  parts  by  a  small  number  like  n  (cf.  Part  II  Matrix  Argument  Reduc¬ 
tion  ). 


I 
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In  general,  if  F(^)  w  S  (cond(S)  *  1),  then  we  expect  can  be  com¬ 


puted  accurately. 
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4.  Numerical  Results 


Two  sets  of  examples  are  considered  in  this  section:  the  first  one  con¬ 
tains  matrices  that  approximate  ln(Jf)  where  is  the  Jordan  block  with  £  on 
the  diagonal.  The  second  set  contains  matrices  of  dimension  6  with  various 
imaginary  parts  on  the  diagonal  elements. 

The  computation  of  es  consists  of  two  steps.  First.  S  is  reduced  to  a  tri¬ 
angular  matrix  C  whose  eigenvalues  are  near  the  real  axis.  Then,  using  the 
Newton  polynomial  of  exp(C), 

exp  (S)  =  exp  (C)  =  A  ?(Z)exp-/  +  ^AftZjexp' ft  (<?-{)/). 

Here  #=(£i . £»)  is  the  diagonal  (eigenvalues)  of  C.  The  coefficients 

Kf(Z)exp  are  computed  by  a  hybrid  algorithm  (SH)  suggested  in  [5].  The 
details  of  the  computation  will  be  given  in  another  paper.  Because  the 
exponential  divided  differences  can  be  computed  accurately.  Newton's 
method  is  quite  reliable  (although  it  is  slower  than  Parlett's  recurrence  [6]). 

In  what  follows,  we  use  single  precision  arithmetic*  to  compute  fl(ss) 
and  cand(S),  The  relative  error  in  fl(es)  and  the  number  cond(5)c 
(e=2"84)  will  be  compared  . 


*  All  computations  in  this  section  are  done  on  a  Vax  1 1  /780.  The  single  precision  arithmetic 
is  decimal  significant  digits  and  the  double  is  ~10  .  Yax  is  a  trademark  of  the  Digital  Equip¬ 
ment  Carp. 


Example  (I). 


The  general  form  of  the  matrix  considered  in  this  example  is  the  matrix 


ta<r-sk 

Inf  i- 


(-D* 

( n-l)r~ 1 


5c  =  /I(ln(/f))=/I( 


where 


f  1 

'<  =  l- 

f 

Because  of  the  roundoff,  exp(S{)  is  close  but  not  equal  to  J(.  Our  results  are 
summarized  in  Table  (4.1)  and  (4.2).  In  Table  (4.1),  the  first  column  is  the 
value  of  the  second  column  is  the  dimension  of  S$  the  third  column  is 

— — 

and  the  last  column  is  err/s  (e=2-84),  where  err  is  the  relative  error  in 
/l(exp(Sf))  in  norm 

firr_(Trr/,,/rrn/«  ||/i(exp(Sf))-exp(Sf)||, 
err -err  (ft  (exp(5f ))) - - . 


Table  (4.2)  contains  the  details  of  the  elements  of  S(  and  exp(S()  when  n=5 
and  f=0.2S.  The  reader  should  notice  that  the  condition  number  of  each  ele¬ 
ment  is  a  good  prediction  of  the  relative  error  in  that  element 
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n 

c  and  (St) 

err/  e 

5 

2.5 

0.3 

1 

10 

5.0 

1.0 

15 

7.5 

1.7 

5 

10.4 

4.0 

0.5 

10 

341 

170 

15 

10921 

1615 

5 

68 

32 

0.25 

10 

7X104 

4X104 

15 

8xl07 

3x10* 

Table  4.1.  Condition  numbers  and  the  relative  error  in  exp(^). 


SiJ 

/«(e5)tj 

correct  value 

p*j 

rel  err/t 

(1.1) 

-1. 386294c +00 

2500000e-01 

2.500000-01 

1.0e+00 

6.4e-02 

(1.3) 

4.000000e+00 

1.000000e+00 

1.000000e+00 

1.0e+00 

6.4e-02 

(1.3) 

-8.000000e+00 

aooooooe+oo 

0.000000e+00 

4.0e+00 

1.0e+00 

(1.4) 

2. 133333e+01 

a000000e+00 

1.589457e-07 

1.0e+06 

1.7e+07 

(1.5) 

-6.400000e+01 

2861023e-06 

6.357830e-07 

1.0e+08 

5.9e+07 

(2.3) 

-1.386294e+00 

2.500000e-01 

2.500000e-01 

1.0e+00 

6.4e-02 

(2.3) 

4.000000e+00 

1.000000e+00 

1.000000e+00 

1.0e+00 

fl.4e-02 

(2.4) 

-fl.000000e+00 

aooooooe+oo 

0.000000e+00 

4.0e+00 

l.Oe+OO 

(2.5) 

2.133333e+01 

O.OOOOOOe+OO 

1.589457e-07 

I.0e+08 

1.7e+07 

(3.3) 

**1.38629+9+00 

asooocoe-oi 

a500000e-01 

1.0e+00 

6.4e*02 

(3.4) 

4.000000e+00 

1.000000e+00 

1.000000e+00 

l.Oe+OO 

6.4e-02 

(3.6) 

-8.000000e+00 

0.000000e+00 

0.000000e+00 

4.0e  400 

l.Oe+OO 

(4.4) 

*1.386294e+00 

2500000e-01 

2.500000e-01 

l.Oe+OO 

6.4e-02 

(4.5) 

4.000000e+00 

1.000000e+00 

1.000000e+00 

l.Oe+OO 

6.4e-02 

(5.5) 

-1.386294e+00 

2500000e-01 

2.500000e**01 

l.Oe+OO 

6.4e02 

Table  4.2  The  elements  of  exp(£{)  and  their  sensitivities 
(with  n  «5  and  f»0.2S). 


Example  (n). 


-20 ki  0.25 
—10 lei 


—12.5  75 

0.25  -12.5 
-5 hi  0.25 
0 


-99  183.5 

75  -99 

—12.5  75 

0.25  -12.5 
5 hi  0.25 
lOki 


In  this  example  we  apply  the  argument  reduction  (cf.  part  II)  on  S 
before  the  matrix  exponential  is  called.  The  reduction  matrix  Cis  triangular 
and  its  eigenvalues  are  near  the  real  suds.  It  may  worth  to  mention  that 
when  our  matrix  exponential  is  applied  to  Sk  directly,  the  relative  error  in 
fl (exp (S’*))  is  much  larger  than  those  in  fl  (exp(Ct)).  We  summarize  our 
results  in  Table  (4.3)  and  (4.4).  In  Table  (4.3),  we  list  the  relative  error  in 
/I  (ex p(C*))  in  the  third  column 


rel  btt— 


||/f(exp(Cfc))— exp(5fc)||1 
||exp(5fc)||i 


In  the  second  column,  ws  give  the  exponential  condition  number  ccmrt  of  S*. 
The  first  column  is  the  value  of  k.  In  Table  (4*4),  we  list  the  details  (the  ele¬ 
ments  of  the  matrix  exponential  and  the  condition  number  of  each  element) 
of  exp(50).  Again  the  p*  j  gives  a  fairly  good  indication  on  the  size  of  the  rela¬ 
tive  error  in  the  i,^-th  element  (except  the  diagonals  and  the  superdiago¬ 
nals,  which  are  computed  by  special  formula,  cf  [5]  for  the  special  formula  of 
the  first  divided  difference). 


c and  ret  err  :  c 


1.7 


8.0 


96.  ai 


138  34. 


192  44. 


492  |  96. 


/l(exp(S,s)) 

correct  uaZue 

pu 

rel  err/t 

(1.D 

a623180e-01 

5.063657C-01 

6.623169e-01 

5.063656C-01 

1.0c  4-00 

2.6e-0l 

(1.2) 

-1.21B954e-03 

-3. 132358e-04 

-1.219954C-03 

-5. 132350C-O4 

1.9e4-02 

7.7C-01 

(1.3) 

6.233662e-02 

2.14fll04e-02 

8.233648e-02 

2.146092e-02 

1.9c +02 

8.0e+01 

(1.4) 

-3.796594e-01 

-l.Q32820e-01 

-3.798592e-01 

-1.032819C-01 

2.0c +02 

9.9c +00 

(1.5) 

5.085295e-01 

1.025526c- 01 

5.065271C-01 

1.02SS22e-0l 

3. 0c +02 

0.Oe+Ol 

(1.6) 

-9. 670381 e-01 

-1.285246e-01 

-9.670222e-01 

-1.285228C-01 

1.2c +03 

2.4c +02 

9.649680e-01 

2.623748e-01 

9. 64 9660c- 01 

2.62 3749e-0l 

1.0c +00 

5.0e-01 

IPR1 

-1.300231e-03 

-2. 623670c- 04 

-1.300231c- 03 

-2.623670C-O4 

1.9e+02 

9.1C-01 

fflnS 

6.559493e-02 

0.750656c- 03 

6.559459C-02 

0.750609c- 03 

1.9e4-02 

8.9e+01 

(2.5) 

-3.948144e-01 

-2. 624261c- 02 

-3.9481 42c-01 

-2. 624261 e-02 

2.0c +02 

9.0c +00 

(2.8) 

5.222191e-01 

-1. 132139c- 00 

5.222168C-01 

-4.065758C-19 

3.8c +02 

7.0C+O1 

(3.3) 

9.912028e-01 

1.323518e-01 

9.912028e-01 

1.3235 10C-O1 

1.0c +00 

3.4C-01 

(3.4) 

•1.323517e-03 

-8.79?189e-05 

-1.323518c- 03 

-0.797166c- 05 

1.9c +02 

l.lc+00 

(3.5) 

6.617710e-02 

-9.313226c- 10 

6.617875e*02 

-3.252607c- 19 

1.9c +02 

e.Be+01 

(3.8) 

-3.946144e-01 

2. 624262c- 02 

•3.948142e-01 

2. 624261 e-02 

2.0c +02 

7. 8c +00 

(4.4) 

1.000000c +00 

0.000000e+00 

1.000000e+00 

0.000000e+00 

1.0c +00 

9.3e-10 

(4.5) 

-1.323517e-03 

8.797187C-05 

-1.323318e-03 

6.797166c-05 

l.Oc+02 

l.lc+00 

(4.8) 

6.559493e-02 

-0.750656c- 03 

6.55945 9e-02 

-8.758609e-03 

1.9c +02 

6.7e+01 

(5.5) 

9.912028^01 

-1.323510C-O1 

9.912026e-01 

-1.323510C-O1 

1.0c +00 

3.4C-01 

(5.8) 

-1. 30023 le-03 

2. 623676c- 04 

-1.300231e-03 

2. 623670c- 04 

1.9c +02 

6.6C-01 

(«.«) 

9.649660e-01 

-2.623740C-O1 

9.649660C-01 

-2.623749e-0l 

l.Oc+OO 

5.0C-01 

Table  4.4  The  elements  of  exp(9j)  and  their  sensitivities  . 
(Complex  number  is  represented  as  a  pair  of  integers.) 
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Appendix  I.  A.  Stability  Analysis  of  The  Schur  Decompo¬ 
sition 

Given  any  square  matrix  A,  there  exist  (by  Schur' s  lemma)  an  unitary 
matrix  U  and  a  triangular  matrix  5  (with  the  eigenvalues  arranged  in  any 
desired  order  along  the  diagonal)  such  that  -UH AU.  S  is  called  the  Schur 
form  of  A  (associated  with  U),  and  sometime  we  call  (S,  U)  a  Schur  decompo¬ 
sition  of  A.  Since  / (A)=  Uf  (S) Ua  for  any  analytic  function  /,  eA  may  be 
computed  by  UesUH;  accordingly,  it  suffices  to  work  on  S.  The  worry  in 
using  Schur  decomposition  to  compute  eA  is  whether  the  whole  process  is 
stable.  In  order  words,  if  the  computed  S  and  V  satisfy  A  »  USUH  and 
UUH  w  /  (whether  or  not  S  is  close  to  the  exact  Schur  form),  does 
*A  •  Ues  UH?  We  will  answer  this  question  in  this  section. 

Notations 

We  fix  some  notations.  Let  K(X)  denotes  the  condition  number  of  X  (for 
matrix  inversion),  i.e.,  jr(*)»!|Jf||*||JH,||.  Here  ||  ||  will  always  denote  the  2- 
norm,  ||j4||»  max  \\Ax  ||s.  We  also  use  the  spectral  abscissa  (or  the  index) 

a(i4)=max  {  Re(f):  (^spectrum (A)  j  (al) 

and  the  log  norm 

p(A)=max\tx\/iespactrunL((AH+A)/2)\.  (a2) 

The  log  norm  has  the  following  properties  (see  [3]): 

In<a)MM||. 

fi(A+B)*(4,A)+jj(B),  and  (a3) 

a(A)*n(A). 


28 


Bounds  on  exp(A) 

We  summarize  some  results  on  bounding  etA,  which  will  be  used  later.  All  the 
following  bounds  can  be  found  in  [3],  [9]  and  [  10]. 

(a)  Lower  bound. 

||8U||  i8t«(4)  ( t*0)  (a4) 

(b)  Norms  and  log  norms. 

11*** ||  i  ;S  e*IMII  (  f^O ).  (a5) 

(c)  Jordan  canonical  form.  If  A—XJX~x  and  m  is  the  maximum  size  of  the 
Jordan  blocks  in  J  ,  then 

||eM||imjr(Jf)-  max  (a6) 

'  "  v  QSrcm-l  r!  v  7 

(d)  Schur  decomposition  bound.  If  UHAU=D+N  where  U  is  unitary,  D  is 
diagonal  and  N  is  strictly  upper  triangular,  then 

(a7) 

Remark .  These  bounds  have  been  discussed  in  detail  in  [3].  One  can  always 
find  examples  that  favor  one  bound  over  the  others.  For  later  use  the  follow¬ 
ing  slightly  general  form 

l|BM||^(f)e‘'^  (a8) 

will  be  used  to  represent  the  bounds  (a5-7).  Thus  g(A)  can  be  the  index  or 
the  log  norm  of  A%  and  )  can  be  a  constant  or  a  polynomial  in  t . 
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Theorem  A.  1. 

If  A-USV=E  and  I-VU-F,  then 

t 

etA-Uets  V-  f  e^^A(EU-USF)erSVdT.  (a9) 

0 

Proof \  Let  X(t)^etA—Uets  V%  we  have 

X'(t)s^-X(t)=AetA-USetsV 

CL t 

~AetA—US-{F-¥VU)etsV 
-Ae  u  -  USFe *  V-  USV  (  Ue a  V) 

~ABtA- USFe ts  V-(A  -E) (  Ue a  V) 
~AX(t)+(EU-USF)ets-  V 

Thus 

*~*A(X'(t)-AX{t))=e-M(EU-USF)etsV 

or 

(e  -** X(t  ))'=e-*  (EU- USF)- ets  V 
Taking  the  integral  of  both  sides  from  0  to  t ,  we  have 

t 

e~tAX(t)= fe^A(EU- USF)erS  dr. 

0 

The  theorem  follows  by  multiplying  eu  from  the  left  to  both  sides  of  the 
above  equation.  • 

Corollary  A.  2. 

||.M_y,lSK||i||^|||y||(||^||  +  ||SF||)t.el.mMM4Wm  (alQ) 

||«M-C/8*K||<||7||  ||  Cflldin  |+||5/’||)fJ>(f).t»-*<-W).«(5»  (all) 

where  p(t)  Is  either 


^iL 


I 


-  mtiinMl 


(a),  a  polynomial  of  degree  less  than  m1+m2— 1*  where  ml  and  mz  are  the 


maximum  sizes  of  Jordan  block  of  A  and  S  respectively, 

(b).  a  polynomial  of  degree  less  than  — 1  with  the  constant  term  equal  to 

1. 

Proof.  Assuming  ||eM|| Theorem  A.  1  implies 
t 

\\eiA-UetsV\\*zfpA(t 

o 

t 

*\\EU- USF 1 1 1 1  V\ |/e <*  ^ w  +** {S)  pA (t  -t)  ps (r)d r. 
o 

Since 

(t  ~r)g(A)+rg(S)  <:  ((t -r)+r)maxfa (A),g  (S)} 

and 

HW-OSFII  *  l|V||||C/||(||£’+||5/,||). 

we  have 

||8M-t/ewK||s||Vl|||C/||(||^||+||Sr||)/pi,(t-T)ps(T)dTe"«f^)-»(5W. 

0 

The  corollaries  follow  by  choosing  suitable  Px(f)  and  g(A)  according  to  (a5- 


Stability  of  Schur  Decomposition. 

When  the  Schur  decomposition  USUH  is  used,  the  computed  U  satisfies 
Corollary  A.2  implies  that  if  \\A  —  UHSU\\^t\\A\\  and 
\\J—UUN\\^t.  then,  for  some  polynomial p(f). 
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This  shows  that  whether  S  accurately  approximated  the  Schur  form  of  A 
does  not  matter;  as  long  as  USUH  is  close  to  At  Ues  UH  is  close  to  eA. 

The  corollary  also  shows  the  possible  advantage  of  using  Schur  decom¬ 
position  to  Jordan  decomposition  (in  computing  eA)  when  A  has  an  ill- 
conditioned  eigensystem.  If  one  uses  Jordan  decomposition,  then 
II  £/|lll  ^11  =  11^11  \\R-l\\*K(R)  where  R  is  the  matrix  that  transforms  A  to  Jor¬ 
dan  normal  form  :  R~lAR=J.  Here  K(R)  could  be  enormous. 
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Appendix  B.  A  Different  Proof  of  Corollary  2.4. 


It  is  well  known  (see,  e.g.  Bellman  [l])  that  a  fundamental  matrix  $  for  a 
system  of  O.D.E.s  with  constant  coefficients  x'=Sx  is  given  by 

*<o=«*  a*!  <-). 

The  solution  <p  (vector  function)  with  given  initial  vector  ^(0)  looks  as  follows 

(P(f)  =  e^-^O)  (U|  <•)• 

Similarly,  ^(f  )=B<r^  ^(0)  is  the  solution  of  x'=r(S)x.  It  is  not  difficult  to 
see  then  that  )e5|^er^  is  equivalent  to  i/(t )&|  99(f)  |  for  any  V'(0)&|^(0)l- 
Thus  Corollary  2.4  follows  from  the  theorem  below. 

Theorem  B.l.  If  y>(t )  and  V(0  are  the  corresponding  solution  of  the 
differential  systems  x’  =  Sx  and  x’  =  T(S)x  (S  is  a  n  by  n  triangular 
matrix),  and  if  V'(O)  &  1 99(0)  |  (element-wise),  then 

*(f);>|p(f)|  (f&O).  (bl) 


Proof.  Since 


— - Oi'i-ipiit)  +  SatJ'PiCO  (x— 1,2 . n), 

ac  \<j 


(b2) 


we  have,  multipling  both  sides  of  (b2)  by  and  rearranging  the  terms, 


at  i<j 

Similarly, 

A.(e-^^l(f)]  =  2|o1J|.[e"Re“‘‘‘ ^j(0].  (b4) 

ar  i<* 


When  i=n. 
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1>n  (t) =8 "R“n  n  ‘  (0)  >  i  ®  '**•’*'  ' P* (0)  |  =  |  (* )  I  • 

So  for  k=n  ^  (t  )&  |  p*  (£  ) )  is  true.  Assume  that  the  inequality 
is  true  for  i<k^n.  Then 


i-[.  )]=  21 «.  J  l«  ) 

a  •<"/(<)! 

i<J 

Since  (0)3:  |  ^  (0)  | ,  by  Theorem  3.7, 

•  )fc  |  a  V(*  )  I  • 

Which  implies 


By  the  principle  of  induction,  Vt(0^(?<(^)l  f°r  every  i  ;  hence  proving  the 
theorem.  ■ 


34 


REFEJRENCES 

[1]  R.  Bellman,  Introduction  to  Matrix  Analysis,  McGraw-Hill,  New  York, 
1969. 

[2]  A.O.  Gel’ f and,  Calcxdus  of  Finite  Differences,  Hindustan.  India  (1971). 

[3]  Bo  KSgstrom,  Bounds  and  perturbation  bounds  for  the  matrix  exponen¬ 
tial,  Bit  17 , 1977,  39-57. 

[4]  C.C.  MacDufFee,  'The  Theory  of  Matrices",  Chelsea,  New  York,  1956. 

[5]  A.  McCurdy,  K.C.  Ng  and  B.N.  Parlett,  Accurate  Computation  of  Divided 
Differences  of  the  Exponential  Function.  1983.  (to  appear) 

[8]  B.N.  Parlett,  Computation  of  functions  of  triangular  matrices,  ERL- 
M481,  UC  Berkeley.  1974. 

[7]  J.R.  Roche,  On.  the  sensitivity  of  the  matrix  exponential  problem, 
R.A.I.R.O.  Analyse  numerique/Numercal  Analysis,  Vol.  15,  n°3,  1981, 
p.249-255  . 

[8]  J.D.  Stafney,  Functions  of  a  Matrix  and  their  norms,  linear  Algebra  and 
its  Application  20,  87-94  (1978). 

[9]  C.  VanLoan,  A  study  of  the  matrix  exponential.  University  of  Manchester 
numerical  analysis  report  10,  1975,  Manchester,  England. 

[10]  C.  VanLoan,  The  sensitivity  of  the  matrix  exponential,  SIAM  J.  NUMER. 
ANAL.,  Vol  14,  No.6  December  1977,  p97 1-981. 

[11]  J.H.  Wilkinson,  Rounding  Errors  in  Algebraic  Processes,  Prentice  Hall 
(1963). 


Part  II. 


Matrix  Argument  Reduction 
and  its 

Application  to  Computing  Exp(S) 
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5.  Introduction 

One  common  and  important  technique  in  computing  periodic  functions 
(like  exp,  sin  and  cos)  is  argument  reduction,  which  reduces  a  given  value  x 
to  an  equivalent  value  in  a  standard  range.  Kg. 

sin(lOO)= sin(  100-32rr)=sin(-0. 530964-  ■• ). 

This  technique  can  be  extended  to  matrices.  For  periodic  functions  like 
exp  and  sin,  one  can  reduce  a  matrix  B  to  some  C  whose  eigenvalues  lie  in  a 
standard  range.  This  technique  is  useful  in  the  matrix  exponential  problem, 
for  exp  is  periodic  on  the  imaginary  axis  and  by  working  with  C  one  needs 
only  consider  matrices  whose  eigenvalues  are  near  the  real  axis. 

In  this  part,  we  restrict  our  attention  to  triangular  matrices1, .  We  begin 
with  a  formal  definition  of  matrix  argument  reduction.  Then,  in  section  7,  a 
numerical  method  for  the  reduction  on  triangular  matrices  is  described, 
finally  we  discuss  the  application  of  argument  reduction  to  the  computation 
of  the  matrix  exponential  in  section  8. 


1  Any  matrix  B  is  uni  tartly  similar  to  its  Schur  form  S ,  which  is  triangular. 


R« 


t 
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6.  Argument  Reduction  for  Matrix  Functions 

The  objective  of  this  section  is  to  define  an  equivalence  relation 
"congruence  modulo  <•>"  on  square  matrices.  It  characterizes  matrices  that 
have  the  same  image  under  certain  periodic  matrix  functions.  More  pre¬ 
cisely,  if  C  is  congruent  to  B  modulo  a,  then  /  (C)=/  (f?)  for  function  /  of 
period  a  As  a  consequence,  a  given  argument  B  may  be  replaced  (for  the 
purpose  of  computing  f(B))  by  some  other  matrix  with  more  desirable  pro- 
perties. 

6.1.  Integral  Representation  of  Congruence 

Given  complex  numbers  77,  £  and  cj*0,  we  say  77  is  congruent  to  {  modulo  o  if 
(17— {)/  0  is  an  integer;  and  using  the  notation  of  Gauss  we  write 

(mod  u)  .  (6.1.1) 

(If  {  is  not  congruent  to  77,  we  shall  write  {*77  (mod  &>).)  When  there  is  no 
doubt  concerning  the  modulus,  the  mod  a  of  the  formula  may  be  omitted  . 
Notice  that  if  (mod  u)  then  some  integer  k  must  exist  such  that 
T7=f— Jfcw,  and  conversely.  Hence,  for  any  periodic  function  /  with  period  a, 
we  have 

/fa)=/(^M=/(a  (8.1.2) 

By  the  Cauchy  integral  formula,  rj  does  have  the  following  rather  complicated 
representation 

where  P  is  a  simple  closed  contour  (positively  oriented)  that  contains  f  in  its 
interior.  For  matrices,  an  analogue  of  the  Cauchy  integral  formula  is  the 
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Dunford-Taylor  integral.  For  any  polynomial  p(t)  it  can  be  proved  that 

P (B)  =  g 

where  T  is  a  closed  contour  that  enclosed  all  eigenvalues  of  A  (see  Kato  [0], 
p.44).  By  analogy,  this  integral  is  used  to  define  p{B)  ,  even  if  p(t)  is  not  a 
polynomial  but  an  analytic  function  with  no  singularity  inside  nor  on  I\  as  fol¬ 
lows.  Let  Xlf  .  .  .  ,  Xm  be  the  eigenvalues,  not  necessarily  all  distinct,  of  a 

given  nxn  matrix  B.  Let  . rw  be  a  collection  of  positively  oriented 

simple  closed  contours.  We  do  not  assume  these  curves  are  disjoint;  they 
may  cross.  We  do  assume  that  each  eigenvalue  X*  lies  on  no  curve  Pf,  and 
lies  inside  just  one  of  those  contours. 

Let  <p(t)  be  a  function  analytic  inside  and  on  each  curve  I\;  if  ^  is  mul¬ 
tivalued  then  choose  a  branch  of  for  each  I\,  not  necessarily  the  same 
branch  for  every  I\.  Then 

*(*>*3=-  1 C. 

2ra  ur, 

Of  course,  different  choices  of  branches  for  <p  on  I\  may  result  in  different 
value  of  In  particular,  we  can  choose  f  to  be  a  "oshift”  function 

p(z)=z-kjO  inside  and  on  I/,  for  any  integers  ki%  to  obtain  the  following 
definition : 

Definition  6.1.  C  »  B  (  mod  a)  just  when  C  is  a  o-shift  function  of  B  : 

c=<p(B)=  -  / r«H<r-B)"dt.  (6.1.4) 

ur« 

Note  that  the  eigenvalues  of  C  are  p(Xj)=Xy  —kju.  Therefore,  each  member  of 
the  class  of  matrices  congruent  to  B  is  determined  by  the  set  of  integers 

. 


f 
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EXAMPLE.  For  any  matrix  B  and  any  integer  k , 

B—koI—B  (mod  cj). 

This  follows  from  choosing  p(*)=z  -ku  inside  and  on  every  I*. 

In  order  to  show  that  "  =  (mod  u)  "  is  an  equivalence  relation  on 
matrices,  we  need  the  following  lemma. 

Tummii  0.1.  for  any  square  matrix  T. 

g(.()=9i(9z(0)  implies  g(T)=gx(g2(T))  (6.1.5) 

where  g2  is  holomorphic  in  a  domain  that  contains  the  eigenvalues  of  T,  and 
gx  is  holomorphic  in  a  domain  that  contains  all  the  eigenvalues  gzO^.)  of 
gz(T).  sog(T)  may  be  defined  by  the  Dunford  integral. 

Proof.  See  Kato  [0],  p.45  .  • 

Theorem  6.2.  The  congruence  relation  on  matrices  is  an  equivalence 
relation;  Le„ 

B-B  (mod  a).  (6.1.6) 

C=B  (mod  cS)  implies  5=C  (mod  o),  (6.1.7) 

A—B  (mod  a)  and  B—C  (mod  w)  implies  A  =  C  (mod  a).  (6.1.8) 

Proof.  (6.1.6)  is  obvious  (A:  =0).  For  (6.1.7),  let  g2~<p  and  define  9\-<p~x  : 
<p~l(z)=z  inside  and  on  Yf.  Then  0iG72(2))=p~V(2)=2  f°r  2  in  T*. 

<=1 . m.  Thus  C&B  (mod  o)  =>  C=<p(B)-g  2{B).  So 

B=gx(g2{B))=gx(C)=v-\C). 

Since  <p~x  is  a  o-shift  function,  by  definition  B=C  (mod  a).  The  proof  of 
(8.1.8)  is  similar  :  if  A-<px{B)  and  B=Vz(c)>  then  ?(2)xPi(?s(2))  is  also  a  o- 
shift  function.  Hence  A=<p(C)  implies  A^C  (mod  a).  • 


t 
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The  usefulness  of  the  congruence  relation  on  matrices  lies  in  the  follow* 
ing  theorem. 

Theorem  6.3.  For  any  /  holomorphic  in  the  complex  plane  with  period 
u,  if  C—B  (mod  w)  then 

f(C)=f(B),  (6.1.9) 

Proof.  The  theorem  follows  from  choosing  g \-f  and  gz~ip  in  Lemma  6.1. 
Since  the  period  of  /  is  o,  /(*)=/(*»(*))  for  any  o-shift  function  <p.  Thus, 
f(C)=f(v(B))=f(B).  - 

6.2.  Argument  Reduction  for  periodic  Function 

Let  MOD(f?,  a)  denote  the  class  of  matrices  that  are  congruent  to  B  modulo 
6).  For  any  holomorphic  function  /  with  period  u,  the  computation  of  f(B) 
might  be  difficult  when  B  is  large  in  certain  norm;  especially  when  series  or 
polynomials  must  be  used.  Theorem  6.3  enables  us  to  pick  any  C  from 
MOD(i?,&>)  and  compute  /(C)  instead-  Thus,  one  can  reduce  the  given  argu¬ 
ment  B  by  choosing  from  MOE(B,v)  a  matrix  smaller  in  norm.  We  call  such  a 
process  argument  reduction.  For  example,  one  may  choose  C  to  have  eigen¬ 
values  ft  that  are  smaller  in  magnitude,  yet  congruent  to  the  corresponding 
Vi  for  instance,  take  ft where  / fc*  is  chosen  to  minimize  |Xt-fccj|  over 
all  integers  k.  More  generally,  we  may  ask  the  following  question  : 

Given  a  matrix  norm,  can  we  characterize  the  matrix  C  of  minimum 
norm  in  MOD(Z?,u)  ? 

It  turns  out  that  such  a  C  depends  on  B  and  the  given  norm  in  a  compli¬ 
cated  way  and  may  not  be  unique. 


EXAMPLE.  Let 
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B= 


0.B  t 
0  0.4 


Since  B  is  triangular,  any  C eM0D(Z?,cj)  is  of  the  following  form  : 


C= 


0.5— k  i 
0 


x„(0.8-fc,)-(0.4-fc2) 
0.8— 0.4 
0.4— kg 


This  form  is  dictated  by  commutativity  CB=BC,  see  equation  (6.2.1).  Now, 
for  different  values  of  t  and  norm,  the  minimizing  C  is  as  follows 

Case  (a),  t  =0,  any  of  the  norms  ||  ”11  it  j  =  1,2, »  ; 

C=diag(-0.2,0.4)  (A:  4= 1  and  fc2=0). 


Case  (b).  t  =0.4,  1-norm  (  ||  C||  ,=max  |  a^j  | ); 


C-B- 


0.8  0.4 
0  0.4 


(fci=fcz=0). 


Case  (c).  t  =  1,  “-norm  (  ||  C\\  .=max§  |  cl  .•  |  ); 


-0.2  1 
0  -0.6 


(k  i=kg— l). 


In  general,  if  B  = 


Xi  t 

0  Xj 


and  a— 1,  then  the  (1.2)  element  of  C  is  equal 


to 


‘x — — 


=  <x(l  - 


^1  ^2  V 

Xi-Xj 


Thus  when  t  j*0  and  Xt  and  Xg  are  sufficiently  close,  the  (1.2)  element  must  be 
large  unless  fc,=fcg.  This  suggests  that  it  may  be  a  good  idea  to  constrain 
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kl-ks  if  X,  and  Xg  are  close  together. 

In  computing  f(B),  if  the  Schur  form  S  of  B  is  used,  then  argument 
reduction  of  a  triangular  matrix  is  required.  For  a  triangular  5  any  C  in 
M0D(5,w)  can  be  computed  by  a  variation  of  Parlett's  recurrence  [5],  pro¬ 
vided  the  eigenvalues  (diagonal)  of  C  are  given.  We  shall  be  content  with  a  C 
that  minimizes  the  sizes  of  the  eigenvalues  under  the  constrain  k^—kj  if  X*  is 
close  to  Xy .  The  numerical  method  in  the  next  section  is  based  on  the  follow¬ 
ing  consequences  of  Definition  6.1 : 

If  C~B  (mod  &>),  then 

(a).  BC=CB\  (8.2.1) 

(6).  {Q-'CQ)~Q"BQ  (mod  «).  (6.2.8) 

Both  follow  from  the  fact  that  C=p(2?)  :  B<p(B)=<p(B)B  implies  (a)  and 
Q-lV(B)Q=<p(Q-lBQ)  implies  (6). 


6.3.  Remark 

It  is  possible  to  define  congruence  via  the  Jordan  form.  Let  /A  denote  a  Jor¬ 
dan  block  with  eigenvalue  X.  Given  any  B,  let 

Q-'BQ  =  /  =  diag(/v/v . J 

be  its  Jordan  normal  form  (in  this  case,  some  of  the  \  may  be  equal),  then 
by  Lemma  6.4  below  the  only  matrices  that  are  congruent  to  J  are 
/'sdiag^x,-*^ . .  where  are  some  integers  (the  only  require¬ 

ment  on  these  integers  is  kt=kj  if  X<=X;).  Consequently,  C-B  (mod  a)  if  and 
only  if  C  is  of  the  form  QJ'Q rl. 
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Lemma  6.4.  The  only  matrices  that  are  congruent  to  are 
Jb=0.±l.+12.-  . 

Proof.  FVom  (6.1.4),  if  C  is  congruent  to  J\,  then  for  any  simple  closed 
contour  T  that  encloses  X,  we  have,  for  some  integer  k , 

d*  =  -  ^kuizI-JJ-'d*  J 

•  Jy-kol. 

Therefore,  the  only  matrices  that  are  congruent  to  J\  are  J\—k  aI-Jx-ku 

(fc=0,±l,±2, 


/ 
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7.  A  Numerical  Method  for  Matrix  Argument  Reduction 
on  Triangular  Matrices 

In  this  section  a  basic  method  for  computing  C=5»(5)€M0D(S,cj)  for  a 
given  triangular  matrix  S  and  a  given  <p  (see  (8.1.4))  is  described.  This 
method  is  unstable  if  some  eigenvalues  of  S  are  almost  equal  but  not  con¬ 
tiguous  on  the  diagonal.  To  avoid  this  situation  the  diagonal  elements  must 
be  moved  into  a  suitable  order  (by  unitary  similarity  transformations). 

7. 1  The  Recurrence  Method 

Let  S  be  given.  For  convenience,  let  Xi=Si.i  (i=  1 . rt)  be  the  eigenvalues  of 

S.  Let  the  o-shift  function  <p  be  defined  by  flp(\t )=\—kicJ  (i  =  l . n),  with  the 

constrain  if  Xi=A^.  Since  S  is  triangular,  C=<p(S)  is  also  triangular  and 

the  eigenvalues  (diagonal  elements)  of  C  are  (i  =  l,...,n).  Given  the 

set  of  integers  . Jfc*],  a  simple  way  to  compute  C  is  to  use  a 

recurrence  based  on  the  commutativity  of  C  and  5  (cf.  [5]).  There  are  three 
cases : 

case  1.  Distinct  eigenvalues.  When  all  A*  are  distinct,  C  is  completely 
determined  by  its  diagonal  elements  :  comparing  the  (ij)  element  (with 
j>i )  of  both  sides  of  0 =SC-GS, 


k*0 


Hence 

y-i-i 

=  2  “  *U-*‘Cy-*j)/(Xt-Aj).  (7.1.1) 

*•0 

Thus  C  can  be  generated  from  the  diagonal  to  the  upper  right  hand  comer. 


f 


case  2.  Contiguous  coincident  eigenvalues.  When  Xr=\r4.l=X#,  the  ele¬ 
ment  is  just  equal  to  (for  r^i<j^s).  To  see  that,  it  suffices  to  con¬ 
sider  the  case  when  r  =  l  and  s=n  (Le.,  all  the  eigenvalues  of  S  are  equal). 
There  is  only  one  simple  contour  T  that  encloses  all  of  the  X*.  By  definition, 


C  =  f (z  ~kZm)  (zI~B)-ldz 

dm  p 

“  S  ok  I. 


(7.1.2) 


Hence, 


cij  ~sij  for  i  <}. 


(7.1.3) 


ca3C  3.  Scattered  coincident  eigenvalues.  When  equal  eigenvalues  are 
not  contiguous,  the  recurrence  breaks  down.  For  example,  formula  (7.1.1) 
yields 


when 


2  1  1 

5  =  0  1,  cj=1  and  c4 1=0  for  i  =  1,2,3. 

2, 


But  the  (1,3)  element  of  C  is  well-defined  !  To  remedy  that,  we  need  to  move 
the  diagonals  of  5  so  that  the  coincident  eigenvalues  are  contiguous  (and 
hence  formula  (7.1.3)  can  be  employed).  There  exist  (complex)  plane  rota¬ 
tions  which  exchange  adjacent  diagonal  elements  and  preserve  triangularity. 
So  it  is  possible  to  find  a  sequence  of  rotations  in  plane  (i,i  + 1)  that  bring  any 
coincident  eigenvalues  together  (see  section  7.2).  Thus  the  computation 
reduces  to  case  1  and  2. 


Remark  1.  In  gerr  al  (7.1.3)  is  true  as  long  as  kt=k  for  any  That 
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means  that  if  fci,fcj+1,  .  .  .  ,kj  are  all  equal,  the  (i,j)  element  of  C  is  given  by 
(7.1.3),  at  no  expense. 


EXAMPLE. 


To  see  how  (7.1.1)  and  (7.1.3)  work,  let  us  compute  C  where  u=l  and 


2.101  10  0 
0  10 
2.1  1 
2.1 


(7.1.4) 


We  pick  the  diagonal  of  C  to  be  (0.101,  0.0,  0.1,  0.1)  (corresponding  to 
A:  i=2,  *2=0,  *3=2  and  *4=2,  which  are  chosen  to  minimize  the  absolute  values 
of  sit  — *fCj).  Using  (7.1.3)  for  the  (2,3)  element  and  (7.1.1)  for  the  rest,  we 
have  (answers  are  correct  to  5  significant  decimal  digits) 


C  = 


0.101  0.048072  0.45330 
0  0.047619 

0.1 


-0.21586 

0.45351 

1 

0.1 


(7.1.5) 


Remark  2.  When  5  significant  decimal  digit  floating  point  arithmetic  is  used 
in  the  above  example,  cancellation  occurs  and  instead  of  (7.1.5)  we  get 


/f(C)= 


0.101  0.048072  0.45300  0.51000 
0  0.047619  0.45351 

0.1  1 
0.1 


(7.1.6) 


Notice  that  the  (1,4)  element  lost  all  its  digits!  This  shows  that  the 
recurrence  is  unstable  if  close  eigenvalues  are  not  adjacent  to  each  other.  To 
avoid  this  problem  eigenvalues  that  are  close  together  should  be  treated  as  if 
they  were  equal  and  therefore  must  be  contiguous  and  have  the  same  *  so 
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that  formula  (7.1.3)  can  be  used  (cf  Remark  1). 


7.2.  Complex  Plane  Rotation 


Reordering  the  diagonals  of  S  can  be  done  by  successive  complex  rotations 
(or  reflections)  in  planes  (i,i+l).  Let 


a  c 

Si.< 

si.i+l 

0  6 

0 

si  +  l.i+l 

Our  problem  is  to  find  an  unitary  matrix  U  to  swap  a  and  b : 


Ua- 


a  c 
0  b 


U  = 


b  c 
0  a 


(7.ai) 


Theorem  7.1.  Suppose  a/6  (otherwise  [/=/),  let  r  =  — ^  .  The  follow¬ 
ing  U  satisfies  (7.2.1) 


U  = 


Vi+|r|a 

Proof.  First  of  all  U  is  unitary,  for 


r  — r/r 
1  r 


(7.2.2) 


1 

r  —r/r 

1 

r  1 

1  0 

Vl+|r  Is 

1  r 

Vl+|r|2 

—t/  r  f 

0  1 

To  show 


a  c 
0  b 


b  c 
0  a  1 


we  multiply  out  both  sides. 


ar +c 

cr-ar/r 

br 

rc-ar/f 

b 

br 

b 

c  +ar 

I 
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where  X=(xij)  is  a  triangular  matrix  defined  as  follow; 


= 


ki  if  i—j, 

0  if  ki=ki+l=--=kj 

fe.i+fcSt+aj  “  Sij-txj-kj)/(**—kj)  if 

fcaO 


(7.3.2) 


The  formulae  in  (7.3.2)  come  from  forcing  SX=XS.  To  verify  C=S—uX, 
notice  that  by  definition 


diag(C)=diag(5’  —uX). 


That  is,  Cij=(S—uX)ij  when  j~i=0.  Assume  cij=(S—oX)ij  for  j-i<k,  life. 
Consider  (S-oX)ij  for  i+j -k.  There  are  two  cases  : 

(1)  if  ki=ki+1=  -  =kj ,  then  (7.3.2)  and  remark  1  in  §7.1  imply 

(S-oX)ij  =  Sij  -  0  =  Cij  ; 


(2)  when  k^kj,  the  relation  XS-SX  implies  S(S  —oX)=(S  —aX)S  and  hence 
the  element  (S  —uX)ij  is  determined  uniquely  by  the  recurrence  for¬ 
mula  (7.1.3).  Therefore  by  induction  assumption  cij=(S—uX) tj. 

Both  cases  yield  Ctj  =(S— uX)ij  for  j—i=k;  hence,  by  induction,  the  equality 
holds  for  all  j  — i&O. 

Formula  (7.3.1)  is  somewhat  preferable  to  (7.1.1)  and  (7.1.3).  Often  a  is 
not  exactly  representable  in  a  computer,  and  (7.3.1)  puts  the  arithmetic  that 
involves  o  into  the  last  step,  thus  preventing  the  propagation  of  roundoff 
error  due  to  the  inexact  representation  of  u. 

When  the  equal  k^'s  are  not  contiguous,  we  can  find  an  unitary  transfor¬ 
mation  U  (e.g.  a  product  of  complex  plane  rotations)  such  that  S’=  UHSU 
has  confluent  diagonals.  Let  X'  be  computed  according  to  (7.3.2)  with  S 
replace  by  S’.  We  have 
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C=<p(  US'UH)=  U<p(S')  u» 

~  U(S -uX‘)  UH =S -a  UX'  U H . 


Thus  C  retains  the  form  C=S—uX  where  X~UX'UH. 


As  an  example,  we  compute  the  example  in  §7.1  again  (using  5 
significant  digit  floating  point  arithmetic  only)  by  swapping  the  first  and  the 
second  diagonal  elements  before  the  reduction.  Recall 


2.101  10  0 
„  0  10 
S=  2.1  1 
2.1 

We  have 


UHSU  =  S'  = 


0  1  0.90293  0 

2.101  -0.42976  0 
2.1  1 
2.1 


where  (using  (7.2.1)  with  r=— 0.47596) 


-0.42976  -0.90293  0  0 

0.90293  -0.42976  0  0 

0  0  10 

0  0  0  1 


Thus  the  corresponding  X'  of  S'  is  equal  to  (by  (7.3.2)) 


0 


X  = 


0.95193  1.0548  -0.50229 
2  0  0 
2  0 
2 


Transform  it  back  and  subtract  it  from  S  to  find 


r 


so 


fl{C)=S~UX'UH  = 


0.101  0.04807  .45331  -.21588 

0  0.047589  .45353 

0.1  1 

0.1 


Recall  the  correctly  rounded  C  (to  five  decimal  significant  digits)  is 


C  = 


0.101  0.048072  0.45330  -0.21586 
0  0.047619  0.45351 

0.1  1 
0.1 


Here  that  the  (l,4)-th  element  is  correct  up  to  the  last  digit  (  compare  the 
completely  wrong  answer  in  (7.1.6)).  Each  element  in  /1(C)  is  in  absolute 
error  by  at  most  3xl  0-3,  which  is  approximately  the  rounding  error  of  the 
largest  element  in  C. 

It  seems  that  the  recurrence  formula  (7.1.1)  always  yields  accurate 
results  as  long  as  X<  and  Xj  are  not  close  to  each  other.  It  worths  to  re-apply 
(7. 1. 1)  once  more  for  computing  those  aq  j  such  that  fc*  xkj  after  the  whole  X 
has  been  computed. 


7.4.  An  Algorithm  for  Computing  C 

Section  7.3  suggests  the  following  algorithm  to  compute  C: 

Algorithm  liAR  (Matrix  Argument  Reduction). 

[1]  Determine  (<=1 . n)  so  that  \  —fyo  is  small  in  magnitude. 

[2]  Order  the  (fc()  so  that  equal  k's  are  contiguous; 

[3]  Find  U  so  that  UHS(J=S‘  has  its  diagonals  in  the  new  ordering; 

[4]  compute  X’  according  to  (7.3.2)  for  S'; 

[5]  transform  back  X:=UX’  UH; 

[8]  recompute  X  by  (7.3.2)  on  applicable  elements; 
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[7]  compute  C:=S-oX. 

Step  [j].  As  we  have  mentioned  in  section  7.1,  each  A^  should  be  chosen 
to  minimize  |  — fc* o |  under  the  constrain  whenever  A*  is  close  to  Aj. 

In  general,  we  say  \  and  Xj  are  close  together  if  A*—  \j  is  less  than  O.Icj. 

Step  [<?].  There  are  many  ways  to  permute  ifclffc2 . A^J  so  that  equal 

ki*s  are  contiguous  (let  us  call  them  confluent  permutations  of 
\kukZl  .  .  .  ,A^j).  Since  every  swap  takes  4(n— 2)  (complex)  floating  point 
operations  (cf.  §7.2),  one  would  like  to  find  a  confluent  permutation  that 
requires  fewest  swaps  to  bring  the  fc*  to  the  new  permutation.  It  turns  out 
that  this  task  is  equivalent  to  the  acyclic  subgraph  problem  (cf.[2]),  which  is 
known  to  be  NP  complete  (i.e.,  exponential  time  is  probably  needed  to  find 
the  minimal  solution).  In  Appendix  II.A,  we  show  that  one  can  easily  find  a 
confluent  permutation  which  requires  fewer  than  n(n— 1)/4  swaps.  For  our 
purpose  this  is  quite  satisfactory. 

Step  [3],  Apply  the  complex  rotations  (in  §7.2)  to  transform  5  to 
S*=UHSU  according  to  the  new  permutation  of  [kitk2 . A^j.  The  infor¬ 

mation  of  the  transformation  matrix  U  should  be  stored  for  later  use  (step 

[5]). 

Step  [4],  [5].  These  steps  are  easy  to  implement  :  X*  can  be  computed 
according  to  (7.3.2)  ;  then  perform  the  inverse  transformation  on  X '  to  get 
X=UX'UH. 

Step  [5].  This  step  can  be  best  described  by  the  following  algorithm  : 
for  i=  1  to  n  ,  j  =i  to  n  do 

>-<-i 

if  (A^A;,)  then  x*j=  £  (ziMh  $ukj  -  ^-mV^-A*). 

*■0 


! 
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Step  [7].  Finally  compute  C=S-o>AT. 


Operation  count  and  sforaye.  The  major  work  is  in  steps  [3],  [4]  and  [5]. 
Since  the  upper  bound  on  the  number  of  swaps  is  n(n— 1)/4  (  cf.  Appendix 
Il.A)  and  each  swap  takes  4(n—2)  ops,  there  are  at  most  n(n— l)(n  —  2)  ops 

for  each  of  [3]  and  [5j.  Step  [4]  and  [6]  together  take  only  — —  ops  .  Hence, 

u 

p 

the  total  number  of  ops  needed  is  at  most  2~-n3  .(This  is  the  worst  case 

analysis.  In  general  the  number  of  swaps  needed  is  very  few,  possibly  at  most 
0 (nz)  in  average*.  For  example,  if  all  kt  are  distinct,  then  no  swap  is  needed  !) 

For  storage,  it  is  possible  to  implement  MAR  without  using  any  matrix 
storage  besides  5  provided  one  uses  (7.1.1)  and  (7.1.3)  instead  of  (7.3. 1-2)  in 
step  [4].  However,  formulae  (7.3. 1-2)  are  recommanded  and  it  would  be  con¬ 
venient  to  have  an  extra  working  matrix.  This  storage  requirement  is  not 
unreasonable  since  the  objective  of  applying  MAR  is  the  computation  of  f(S)t 
so  there  should  be  at  least  an  available  array  waiting  for  the  function. 

A  fortran  program  of  this  algorithm  is  given  in  Appendix  ILB. 


7.5.  A  Simplified  Formula  for  xl  3  when  ki=k3. 

When  kx=k2  but  kx*k2  (if  kl=k2=k2  then  x13=0),  it  is  possible  to  compute  x13 
by  a  single  formula  which  suffers  no  cancellation. 


*  If  one  uses  QR  algorithm  to  get  the  Schur  form,  then  the  equal  eigenvalues  tend  to  cluster 
together  on  the  diagonal.  The  worst  situation  simply  won't  happen. 
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Theorem  7.2.  For  kt=k3  and  k^kg. 


_  *12*23^2-*!) 

113  (S22-SU)(S22— Sga)  ' 


Proof.  First,  we  swap  sgg  and  sgg  in  S.  Take 


P  = 


10  0 
0  1*„  It  12 
0  U21  1*22 


where 


«n 

*t  12 

1 

r  —r/f 

r=  Sffl 

1*2  j 

1*22 

Vi+Pl* 

1  r 

S33-S22 

We  have 


S'  =  PHSP  = 


*11 


*33  *23  * 
*22 


According  to  (7.3.2)  (with  fci=*3  in  mind). 


*11  *  * 

*1  0  X’i3 

S'  - 

*83  *23 

II 

H 

t 

*3  *'23 

*22 

for  some  x’l3  and  x‘23-  Transforming  it  back  we  have 


*1  **lrfSZ12  **13°22 

kl  X|2  X,3 

PX’PH  =  X  = 

*2 

*1 

kg  Xgg 

k i 

A*  — Jfc 

Since  x12  =  Sia,  8  =  *'is®i2 

*22—*  11 


Cea  _  F  __  *~*»a 

C18  /  fj  *3S“*«S 


(7.5.1) 


c 

x’ 13^22 =X'13U1Z(- - ) 

U?2 

_ _ (^2~*0  ®Z2 

12 - - - 

*28-sU  <Zi2 

_  «ia»2a(fca-fe|) 
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8.  Application  to  Matrix  Exponentials 

&  1.  The  Matrix  Exponential 
It  is  well  known  that 

g*4»2ivi  =;  e*  . 

therefore,  with  6J=2m  Theorem  2.3  asserts 

es  =  ec  where  C^tp(S)  €  MOD (S.u). 

So  one  can  apply  the  algorithm  in  the  previous  section  and  replace  S  by  C 
for  computing  the  matrix  exponential  (in  this  case,  C=tp(S)  is  chosen  to  have 
eigenvalues  close  to  the  real  axis).  There  are  two  advantages: 

(a)  The  norm  of  C  is  usuailv  smaller  than  the  norm  of  S;  therefore,  C  is  less 
sensitive  than  S  to  the  roundoff  error  in  the  calculation  of  matrix 
exponentials;  especially  when  the  eigenvalues  of  S  have  large  variation 
of  imaginary  parts; 

(b)  the  imaginary  parts  of  the  eigenvalues  of  C  are  bounded  in  magnitude. 

Property  (b)  is  important  in  the  accurate  computation  of  the  exponen¬ 
tial  divided  differences  (cf.  [4]).  Because  of  the  bounded  imaginary  parts,  the 
coefficients  Af(Z)ex p  of  the  Newton  polynomial 

ec  =  A?(Z)exp  /  +’SV(^)exp  f[(C-0/)  (8.1.1) 

can  be  computed  accurately;  consequently  (8.1.1)  becomes  one  of  the  most 
accurate  methods  for  computing  the  matrix  exponential. 

As  an  illustration,  we  compute*  the  exponential  of  the  following  S  using 

t  The  calculation  is  done  on  a-Vax  11/760  using  single  precision  arithmetic  (the  machine 
precision  is  which  is  approximately  7  decimal  significant  digits). 
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ij 

ReS 

ImS 

Re  C 

to  C 

1.1 

O.OOOOOOOe+QO 

1.0000G00e+02 

0. OOOOOOOe +00 

-5. 309649 le-01 

1.2 

5.0000000e+00 

0.0000000e+00 

-2.6548386e-02 

0.0000000e+00 

1.3 

-1.25000Q0e+01 

0.0000000e+00 

-2.8269482e-01 

1.7453294e-02 

1.4 

4.l687500e+01 

0.0000000e+00 

5.2827454e-01 

0.0000C00e+00 

1.5 

-1.562500 0e+02 

0.0000000e+00 

8.2632446e-01 

-1.0206868e-01 

1.6 

6. 25Q0000e+02 

O.OOOOOOOe+OG 

6.244049  le+02 

2.8962694e+0l 

1.7 

-2.6042500e+03 

0.  OOOOOOOe  400 

1.2380859e+01 

3.1803755e+0l 

2.2 

0.0000000e+00 

5.0000000e+01 

0.000000064*00 

-2.6546243e-01 

2.3 

5.0000000e+00 

O.OQOOOOOe+OO 

2.8761 10  le-01 

0.0000000e+00 

2.4 

-1.2500000e+01 

O.QQOOOOOe+OO 

-2.8269482e-01 

-1.7453294e-02 

2.5 

4. 1687500e+0 1 

0.00000000400 

-2.2161484e-01 

6.47887 14e-02 

2,6 

-1.5625000e+02 

0.  OOOOOOOe  400 

8.235 1685e-01 

1.7664604^-01 

2.7 

6.2500000e+02 

0.0000000e+00 

-3.3l46973e+00 

-3.1911498e-01 

3.3 

O.OOOOOOOe+QO 

1.0000000e+01 

0.  OOOOOOOe  4-00 

-2. 5663707e+00 

3.4 

5.0000000e+00 

0.0000000e+00 

-2.6548386e-02 

0.0000000e+00 

3.5 

-1.2500000e+01 

0.0000000e+00 

3.5l969?2e-Cl 

-9.51 9969 le-03 

3.6 

4.1687500e+01 

0.0000000e+00 

9.4289780e-01 

2.8333304e-02 

3,7 

-I.5625000e4.02 

0.0000000e+00 

-1.2384033e+00 

3.5637300e-04 

4.4 

O.OOOOOGOe+OO 

-4.0000000e+01 

O.OQOOOOOe+OO 

-2.300888  le+00 

4.5 

5.000000 0e+00 

0.0000000e+00 

-2.3598766e-0 1 

0.0000000e+00 

4.6 

-1.2500000e+01 

0.0000000e+00 

-1.5802860e-01 

-7. 4799755 e-03 

4.7 

4. 1687500e+01 

0.0000000e+00 

2. 1535873e-0 1 

6.2332950e-03 

5,5 

0.0000000e+00 

-1.0000000e+02 

0. OOOOOOOe +00 

5.309649  le-01 

5.6 

5.0000000e+Q0 

0.0000000e+00 

-2. 6548386e-02 

O.OQOOOQOe+OO 

5.7 

-1.2500000e+01 

0.0000000e+00 

6.6370964e-02 

0. 0000000e+00 

6.6 

0.0000000e+00 

1.0000000e+02 

0.0000000e+00 

-5.309649  le-01 

6.7 

5.00000QOe4-00 

0.0000000e+00 

-2.6548386e-02 

0.0000000e+00 

7.7 

0.0000000e+00 

2. 0000000e+02 

0.0000000e+00 

-1.06l9298e+00 

Table  6.1.2.  Matrix  S  and  the  computed  C  . 
cond(S)  w  3.1  ,  cand(C)  «  1 


Newton’s  interpolating  polynomial  method  (the  coefficients  are  computed  by 
algorithms  in  [4])  on  fl(C).  Every  pair  of  numbers  in  the  following  tables 
represent  the  real  and  imaginary  part  of  the  corresponding  element.  As  a 
summary,  we  have* 


l|e*-/Kec)|| 


w  5.7e  .  e= 2 


-0-24 


(8.1.3) 


1  If  we  apply  our  exponential  directly 

differences),  then - - - *  iSlO.Oe; 

11**11 1 


to  S  (\ising  algorithm  SH(II)  in  [4]  for  the  divided 
almost  200  times  larger  than  the  error  in  (a  1.3). 


f 
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For  reference,  we  give  the  exponential  condition  number  of  S  and  C  (see 
part  I)  : 


cand(S) « 31.0  ,  cond(C) 


i.j 

correctly  rounded  e  s 

computed  ec 

(1.1) 

8.623 189e-01 

-5.063657e-01 

8.623189e-01 

-5.063656e-01 

(1.2) 

-2.439908e-02 

1.026472e-02 

-2.439908e-02 

1.026472e-02 

(1.3) 

7.868409e-03 

2.396153e-01 

7.868374e-03 

2.396 156e-0l 

(1.4) 

7.220490e-02 

-4.62490  le-01 

7.220464e-02 

-4.624946e-0 1 

(1.6) 

7.280552e-01 

-2.93644 le-02 

7. 280577 e-01 

-2.936442e-02 

(1.6) 

5.530223e+Q2 

-2.910502e+02 

5.530224e+02 

-2.910502e+02 

(1.7) 

2.462661e+01 

1.804374e+01 

2.462648e+01 

1.804387e+01 

(2.2) 

9.649660e-01 

-2.623748e-01 

9.649660e-01 

-2.623749e-01 

(2.3) 

3.520579e-02 

-2.255047e-01 

3.520578e-02 

-2. 255047 e-01 

(2.4) 

-8.053 124e-02 

2.258 143e-01 

-8.053  l32e-02 

2.258 146 e-01 

(2.5) 

-1.700779e-01 

- 1.0407 l3e-02 

-1.700736e-01 

-1.04066  le-02 

(2.6) 

8.959ll8e-01 

-2.7749 17e-01 

8.958906e-01 

-2.774836e-01 

(2.7) 

-2.80567 4e+00 

1.929768e+00 

-2.805659e+00 

1.929756e4*00 

(3.3) 

-8.390715e-01 

-5.44021  le-01 

-8.3907 15e-01 

-5.44021  le-01 

(3.4) 

2.010921e-02 

1.72l335e-02 

2.0l0920e-02 

1.721335e-02 

(3.5) 

1.143518e-01 

-1.989942e-01 

1.1435 19e-01 

-1.989943 e-01 

(3.6) 

3.83 1667 e-02 

-7.865904e-01 

3.831670e-02 

-7.865940e-01 

(3.7) 

2.745636e-01 

1.096910e+00 

2. 745654 e-01 

1.096918e+00 

(4.4) 

-6.66938 le-01 

-7.45 113  le-01 

-6.66938  le-01 

-7.451 132e-01 

(4.5) 

-1.042399e-01 

1.274381e-01 

-1.042899e-01 

1.27438  le-01 

(4.6) 

-2.586806e-02 

1.337202e-01 

-2.5868 10e-02 

1.337204e-01 

(4.7) 

-1.971859e-02 

-1.977346e-01 

-1.971872e-02 

-1.977361e-0l 

(5.5) 

8.623 189e-01 

5.063657e-01 

8.623189e-01 

5.063656e-0l 

(5.6) 

-2.531828e-02 

0.000000e+00 

-2.531828e-02 

-7.3686l8e-19 

(5.7) 

5.779 B87e- 02 

-1.574674e-02 

5.779857e-02 

-1.574666e-02 

(6.6) 

8.623 189e-01 

-5.063657e-0l 

8.623 189e-01 

-5.063656e-01 

(6.7) 

-1.834658e-02 

1.875656e-02 

-1.834658e-02 

1.875656e-02 

(7.7) 

4.071 877 e-01 

-8.732973e-01 

4.871877e-01 

-8.732973e-0l 

Table  8.1.4.  The  computed  ec  and  the  correctly  rounded  es. 
||  es-/i(ec)||  «  5.7x2“84  »  10'6. 
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8.2.  Stability  analysis  of  Step  [4] 

When  X '  is  computed  according  to  X>S*=S*X\  it  may  happen  that  the  com¬ 
puted  X'  is  nowhere  close  to  the  exact  answer  even  though  X'S'^S’X'.  In  this 
section,  we  show  that  the  commutativity  play  an  important  role  and  is  the 
essential  condition  determining  whether  e5  is  close  to  ec .  Thus  it  is  not 
essential  that  X'  be  accurate,  only  that  it  must  nearly  commute  with  S*. 
Recall  C  is  computed  by  5— uX  (o=2rri)  and  X-  UX'  UB.  Let  E=X'S'~S'X\  we 
have  E=UHXSU-UffSXU=UH(XS-SX)U;  hence  \\E\\  =||ATS-SX||  ,  since  U 
is  unitary. 

Theorem  8. 1.  With  CtX ,  and  E  as  given  above 

||  es-ec||  ^|-||  £||  ♦Hc«.  (8.2.1) 

We  first  prove  the  following  two  lemmas. 

Lemma  8.2.  For  square  conformable  matrices  A  and  B 

t 

g 1  {A+B) _e  tA . g  tB  -J g  (t  -r){A +B)(gg tA  _e rA  q  } g  rB  d  T 
0 

Proof  of  Lemma  8.2.  Let  Y{t)-ei^A^B^—etAeiB.  The  derivative  of  Y{t)  is 

4r  Y(t  )=(A  +B)b* U+B)-Ae  tAe tB  -e tA  Be tB 

o it 

=(A+B)Y(t)+(BetA-elA  B)erB 

Therefore 

jL(e-«A+B)y(t))=e-HA+B)(BetA-*UB)etB. 

at 

t 

Y(t)= jf e ~T)(-4 *B\Be tA -e rA B)erBd r.  - 


Since  7(0) =0. 
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Lemma  B.3. 

t 

BetA-etAB= f e^A(BA~AB)erAdT. 

0 

Proof  of  Lemma  8. 3.  Let  Z(t  )=BetA  -e  tAB.  We  have 

~-Z(t)=BAeu-ABtAB 

=BAb  tA  -ABe  tA+ABetA-MtAB 
=AZ(t)+(BA-AB)etA. 

Therefore, 

e  ~tA(~-Z(t  )-AZ(t  ))=e  ~u  (BA  ~AB)e tA 
^-(e-tAZ(t))=e~tA(BA-AB)etA. 

Since  Z( 0)=0 

t 

B'tAZ(t)=fe^A(BA-AB)erAdr 

o 

and  finally 

t 

Z{t)-Je^^A{BA-AB)erAdr.  • 
o 

Proof  of  Theorem  8.1,  After  the  transformation  in  step  [3],  the  diagonal  of  X ' 
is  of  form 

diag  (X')=(a,a,..,a,b,b c,c,..,c), 

where  a, 6  ,..,c  denote  distinct  integers.  According  to  (7.3.2),  X'  has  the  fol- 
lowing  form 
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a  0  0  •  •  *  * 
•  •  0  *  •  *  • 
a  *  *  *  * 

X'  =  * 

c  0  0 

••  0 


We  claim  that 

exp(27rUT)  =  I. 

It  suffices  to  consider  the  case  that  diag(X')  has  only  two  distinct  integers, 

o  0  0  0  *  *  * 

•  0  0  *  *  * 

a  0  •  *  • 

a  *  *  * 

6  0  0 

••  0 

6 

/  It  is  obvious  that  F=exp(27riX')  is  of  form 

exp(2rriX')  =  F  = 

Since  FX'—X’F,  by  comparing  the  (l,2)-th  block  of  FX’ 

CKi+^i zblz  =  a /,Fl2+(*)/2. 

Since  a*b ,  iri2=0. 

From  step  [5]  of  MAR,  A  =  UX'Uff;  hence 

Therefore,  if  we  write 

g  g  2tr iX ,  g  C 


a/l 

”  1  6/z 


then  Lemma  8.2  implies  (with  t  =  1) 
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es-ec- fe^s  (CeT*r{X-eyi*xC)  erC  dr.  (8.2.2) 

0 

Using  ||  es||^ell5H,  Lemma  8.3  implies  (for  t^O) 

||  CqiZkIX  _  er2trtAf^|| 

^  ||  C(ZmTX)-(2mTX)C\\  •  y'e(r-6l|2»«,ll.ef||a««r||d| 

0 

^  27r||  ( C+ZniX)(X)-(X)(C+2iriX)\\  •Te8’"'11*11 
^27r||S%-XS||reTllJfll. 

Since  ||  £*11  =  ||  S*-%S||  and  ||  S’||^27t||  X\\  +||  C|| .  (8.2.2)  implies 

||  e5-ec||  =fi||  E\\ e «Mian  *11  C\\).fTdT. 

0 

and  the  bound  (8.2.1)  follows.  ■ 

Remark.  Using  different  bounds  on  ||  es||  yields  different  results.  For  exam¬ 
ple,  if  one  uses  ||es||^e^5\  where  /. i(S)  is  the  log  norm  of  S 
(/i(S):=max{Re(Xi)  :  Aie\(S)}),  then  the  same  reasoning  as  above  yields 

2 

See  part  I  Appendix  I. A  for  other  useful  bounds  on  |j  es  || . 


Appendix  II.A.  (A  Bound  on  The  Number  of  Swaps) 


Our  intention  here  is  to  examine  how  many  adjacent  swaps  are  needed 

to  reorder  n  integers  k=(fc1#A:2 . A:n)  so  that  coincident  A \*s  go  together. 

We  call  such  an  ordering  confluent  (or  a  confluent  permutation  of 

(fcj . fcnD-  To  clarify  our  meaning  of  swapping,  we  will  use  swap  to  denote 

the  exchange  of  adjacent  elements,  and  exchange  to  denote  the  exchange  of 
any  two  elements. 

There  are  many  ways  to  reorder  k.  For  example,  if  k=  (2, 3, 2. 5, 3, 2),  then 
there  are  six  confluent  permutations  of  them: 

(2,2,2,5.3.3),  (5,3, 3, 2, 2,2),  (3,3.5.2.2,2). 

Our  problem  here  is  to  determine  a  confluent  permutation  k*  for  k  such  that 
the  transformation  from  k  to  k  requires  a  nearly  minimal  number  of  swaps. 
To  formulate  the  problem  clearly,  we  follow  [l]  in  using  the  notion  of  mul¬ 
tiset.  A  multiset  is  like  a  set  (where  a  set  is  a  collection  of  distinct  elements) 
except  that  it  can  have  repetitions  of  identical  elements.  For  example, 

M=\a,a,a,b  tb  ,c  %d,d,d,dl, 

which  contain  3  ds,  2  fa's,  1  c,  and  4  ds.  We  may  also  indicate  the  multiplici¬ 
ties  of  elements  in  another  way,  namely 

M={3at2 b.cAdl 

A  permutation  of  M  is  an  arrangement  of  its  elements  ,  e.g.  , 

cabddabdad . 

From  another  point  of  view  we  could  call  this  a  string  of  letters,  containing  3 
ds,  26's,  lc,  and  4  ds.  For  convenience,  we  use  per(M)  to  denote  the  set  of 
all  permutations  of  M  where  x=(xlf  .  .  .  Izn)  is  a  typical  element. 
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yie  consider  the  multiset  of  integers.  Let  M  =\kuk2 . Jfc*  J  where  k^s 

are  integers;  then  M  is  a  multiset.  If  a1#  .  .  .  .a*  denote  the  distinct  integers 
in  U ,  then  M  can  be  written  as 

M=\nx- n2  a2l  .... 

where  rti  is  the  multiplicity  of  a*,  k  confluent  permutation  of  A#  is  a  permu¬ 
tation  of  M,  (op)Cper(A#),  such  that 

ap-(nj>WaPlD>  np(2>'ar(2> . nj>(O  ap(o)* 

where  p  :p(i),  i=l,2,....l  is  a  permutation  of  1,2 Le.,  peper{l,2,...,lj. 
Since  we  are  interested  in  the  number  of  swaps  needed  to  transform  one  per¬ 
mutation  to  another,  we  define  r  (xy)  to  be  the  minimal  number  of  swaps 
needed  to  change  x  to  y. 

For  example,  if  x=(2,3,2,5,3,2)  and  y=(2,2,2,5.3,3),  then  r(xy)=5.  Each 
new  line  is  obtained  from  the  previous  one  by  a  swap 

x  =  (2.  3.  2,  5.  3.  2) 

-*  (2.  2.  3.  5.  3.  2) 

-  (2.  2.  5.  3.  3.  2) 

-*  (2,  2.  5.  3.  2.  3) 

**  (2.  2.  5.  2.  3.  3) 

-  (2,  2,  2.  5,  3.  3)  =  y. 

With  the  above  notation,  we  restate  our  problem  as  follow 

Problem.  Given  any  permutation  k=(A:1 . fc*)  of  M,  find 

J>€per{l,2,...,lj  ( l  is  the  number  of  distinct  integers  in  M)  so  that  the  r(k,a^) 
is  minimal  over  per{  1,2,. ...1). 

This  is  a  NP-complete  problem,  that  is,  it  probably  requires  exponential 
time  (in  l)  to  find  the  minimal  solution  (for  one  thing,  there  are  If  choice  of 


t 
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p).  In  fact,  this  problem  is  related  to  the  acyclic  subgraph  problem ,  (see 
[2^),  which  is  well  known  to  be  NP-complete.  We’ll  establish  the  connection 
between  our  problem  and  the  acyclic  subgraph  problem  later.  First,  we  show 
that  we  can  always  find  a  p  such  that  r(k,ap)  is  less  than 

—  ( 1 — ^ ^  n(n— 1)/4.  Let  -p  denotes  the  reverse  permutation  of  p:  if 

P=P(1)»P( 2) . then  -p=p(l),p(l -l),...,p(l)  (Le.,  (-p)(i)=p(Z-i)). 

Also  we  use  I  to  denote  the  identity  permutation  of  per(l,2,...,Z}.  Note  that 
-p  is  not  the  inverse  ofp. 

Theorem  A.1.  For  any  given  p, 

min  {r(k.ap).  r(k.a_,)J  s  ^-(l-y-). 

Proof.  It  suffices  to  consider  p=/,  for  one  can  always  replace  04  by  ap^).  Let 
rjij  denote  the  number  of  Oj  in  front  of  c^.  By  that  we  mean  for  each  a*,  we 
count  how  many  o^'s  are  on  the  left  of  this  a*  (in  k).  then  take  the  sum  over 
all  a*  and  call  it  the  number  of  Oj  before  a*.  For  example,  if 

where  aj=7,  a2=8,  a3=5, 

then  and  17i,2=3  •  (Vtj  can  be  considered  as  the  number  of 

inversions  between  a*  and  aj.  This  is  a  generalization  of  the  inversions 

defined  in  [1]  p.11:  given  (ij)  in  order,  and  a  permutation  k=(A:2 . 1^), 

we  call  a  pair  an  inversion  of  the  permutation  with  respect  to 

(ot.a>)  if  fcjn |=a^  is  before  (to  the  left  of)  fcm2=Oi.  r)i4  is  just  the  total  number 
of  inversions  with  respect  to  (o^oj).) 

Lemma  A.  2. 

r( *.«))- Zmj-  (al) 
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Proof  of  Lemma  A 2,  By  the  definition  of  rjij,  there  are  before  a*, 

therefore  at  least  r)itj  swaps  are  needed  between  a*  and  a j  in  order  to  bring 
all  CLi  before  Oj.  It  is  then  obvious  that  the  number  of  swaps  needed  to  bring 

all  a i  to  the  left  of  all  a2,  ...  ,  a*  is  at  least  However,  since  there 

t«2 

are  only  non-aj  before  alt  we  can  move  all  a4  to  the  far  left  in  exactly 

4=2 


swaps.  The  same  reasoning  shows  that  there  are  swaps  needed 

<«s 

to  bring  all  a2  to  the  left  of  all  a3,a4,  .  .  .  ,  Oj,  and  so  on.  The  lemma  is  proved. 


Notice  that  Lemma  A.2  also  implies  that  JjVij  is  equal  to  r (k,a_/). 

i>i 


Lemma  A3. 


Hvij* 


n* 

2 


Proof  of  I&rruma.  A3.  Recall  rt^  is  the  multiplicity  of  04.  We  first  show  that 


Vij+Vj.i  =7k*j  (a2) 

for  i*j.  There  are  always  nj  cij  on  both  sides  of  each  of  a*.  Therefore,  the 
number  of  swappings  to  bring  each  of  Of  to  the  left  of  all  aj  and  to  the  right 
of  all  Oj  is  TLj .  Since  there  are  of  a^,  the  total  swapping  is  Because  of 

(a2),  we  have 


HVij  =  S  Vij  + 

«•*!  *>j  t  >i 


-  SCnu  +  Vi. i)  - 

*>j  i>j 


=  2  -  hi71*2 

l *ij*nc  c  im\ 

=  |-(n,+  - +n,)*- 

C  C  4»1 


t 
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The  last  Inequality  follows  from  Cauchy-Schwartz  inequality 

(ni+"-+7ii)2«f-(n?+---njz).  • 

n2  « 

Lemma  A- 3  implies  that  either  or  2*7*  j  is  -j— (l— — );  therefore,  by 

f>t  i>j  4  1 

_  2  , 

Lemma  A.2,  either  r(k,a^)  or  r(k,a_/)  is  less  than  — - —  (l — 7—)-  This  completes 

4  4 

the  proof  of  the  theorem.  ■ 


nB  1 

Remark  1.  Theorem  A.1  implies  -y(l— y)  is  an  upper  bound  on 
minr(k9ap).  This  bound  is  attainable,  e.g.,  (kx . &*)=(!, 2...,  —  ^...,1) 

p  2  2 

when  n  is  even.  One  can  easily  see  that  this  example  takes  at  least 
n(n— 2)/4  swaps  to  bring  ail  pair  numbers  together. 


Remark  2.  From  the  proof  of  the  theorem  our  problem  is  equivalent  to 

minr(lc.ap).  (a3) 

or. 


min 

p 


.  £ ,  Vp(i)j>uy 

pW<pU) 


(a4) 


If  we  change  min  to  max,  then  (a4)  is  exactly  the  quadratic  assignment  prob¬ 
lem  ([2]),  a  NP-complete  problem.  Various  methods  (see  [2])  have  been  sug¬ 
gested  to  And  the  maximal  (or  minimal)  solution.  However,  in  our  Applica¬ 
tion,  there  is  little  incentive  for  a  minimal  solution.  Since  either  a;  or  a../ 


takes  less  than  n(n— 1)/4  swaps,  it  is  quite  sufficient  to  choose  the  'smaller* 
one  (the  example  in  Remark  1  shows  that  one  cannot  in  general  expect  a 
further  reduction  on  the  number  of  swaps). 

In  Appendix  Il.B,  a  Fortran  program  for  matrix  argument  reduction  is 
given  in  which  the  values  of  o^'s  are  set  up  according  to  the  first  appearance 
of  distinct  integer  in  k  from  left  to  right,  and  then  a  comparison  on  r(k,a/) 
and  r(k,a_/)  determines  which  permutation  will  be  used. 


Appendix  II.  B:  Program  Listing  and  Usage 

The  subroutine  for  matrix  argument  reduction  is  "modt”.  which  will  return 
mod(T)  in  the  upper  part  of  T  (the  lower  part  will  store  the  original  T),  The 
user  must  provide  subroutine  "modi",  the  one  dimensional  argument  reduc¬ 
tion  function.  The  details  of  the  numerical  method  can  be  found  in  section  7. 


Subroutine  modt(m,nat,f,z,w.tau). 

Given  a  triangular  matrix  7\  this  subroutine  computes  mod(tau*T) 
according  to  the  algorithm  MAR  in  section  7.  The  resulted  matrix  will  be 
stored  in  the  strict  upper  part  of  T  with  the  diagonal  in  z  (l),...,z  (n).  The  ori¬ 
ginal  matrix  will  be  saved  in  the  lower  T .  The  user  must  provide  the  subrou¬ 
tine  modi.  There  is  a  subprogram  "swap”. 

m  the  global  dimension  of  matrix  t(f 

n  the  dimension  of  t,f 

t  input  complex  matrix  T 

f  working  complex  matrix 

z  output  complex  vector*  store  the  diagonal  of  mod(tau*T) 
p,q  integer  working  array, 

tau  input  complex  parameter. 


subroutine  modern, n,t,ffzfp,q# tau) 
complex  t(m,n),f(mfn),z(n)ftau 
integer  p(n),q(n) 

c  This  subroutine  computes  the  argument  of  tau*t.  Result  is  stored  in  strict 
c  upper  t  with  z  the  diagonal.  The  original  triangular  will  be  stored  in 
c  the  lower  t  with  diagonal.  An  user  provided  subroutine  "modi”  is  required, 
complex  x,inf,w 
inf  =  ( 1 0e37a  1  Oe  37) 

c  m# 

w=  user  provided  (the  period) 

c  #### 

do  10  i=l,n 
p(i)=0 

z(i)=tau*t(iti) 
call  modl(z(i),xtq(i)) 

10  continue 

do  20  i=l, n 


do  20  j=l,n 

x=z(i)-zG) . 


if(abs(real(x))+abs(imag(x)).gt0.1e0)  goto : 
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30 


40 
c  . 


20  continue 
c  ...find  out  the  ordering 
p(l)=l 
1=1 
in=0 

do  40  j=2,n 
do  30  i=l,j-l 
if(p(j)  eq.O)  then 

iffqfn.eq.qO))  p(j)=p(i) 
if(q(i).ne.q(j))  in=in+l 

else 

if(pO)-gt.p(i))  in=in+l 
if(p(j).lLp(i))  in=in-l 

endif 

f(i,j)=tau*t(i,j) 
t(j.i)=inf 
continue 

if(p(j).ne.O)  goto  40 
1=1+1 
pG)=i 

continue 

swapping  forward 
itest=0 
isq=sign(l,in) 
do  50  i=2,n 
do  50  j=n,i,-l 

if(isq*(pO);p(i-l)).ge.O)  goto  50 

x=fG-l.j)/(z(j)-zU-l)) 
tG,i-l)=conjg(x) 
call  swap(mfn,flz,q,j1x) 
itest=l 

PG-i)=i 

continue 
c  ...reduction 

do  70  j=l.n-l 
do  70  i=l,n-j 
if(p(i).eq.p(i+j))  then 

fQ+i.i)=f(i.j+») 
f(i.i+j)=0 


50 


else 


60 


endif 
70  continue 
c  ...back  swapping 

do  80  j=n-l,l,-l 
do  80  i=j+l,n 
x=t(i.j) 


x=f(i.i+j)*cmplx(q(i+j)-q(i)) 
do  60  k=l,j-l 

x=x+f(i+k,i)*f(i+k.i+j)-f(i.i+j-k)*f(i+j.i+j-k) 

f(i+j,i)=f(i.i+i) 

f(i,i+j)=x/(z(i+j)-2(i)) 


if(x.eq.infl  goto  80 
call  swap(m,n,f,z,q,-i,x) 

80  continue 
c  ...final  reduction 

if(itest.eq.O)  goto  110 
do  100  i=n-l,l,-l 
itest=0 
do  100  j=i+l,n 
if(q(i).eq.q(j))  then 

iflitest.eq.O)  f(i,j)=0 

if(itest.ne.0.and.j-i.eq.2)  f(i,j)=t(i,i+l)*t(i+l,i+2)* 

*  cmplx(q(i+  l)-q(i))/(t(i+  l,i+  l)-t(i,i))/(t(i+  l,i+  l)-tQ.j)) 

else 

itest=  1 

x=t(ifj)*cmplx(q(j)-q(i)) 
do  90  k=l,j-i-l 

90  x=x+t(i,i+k)  *f(i+k,  i)-f(i,  j-k)  *tO*k,  j) 

f(i.j)=x*tau/(z(j)-z(i)) 

endif 

100  continue 
110  do  120  i=l,n 

call  modl(z(i),x,k) 
z(i)=x+cmplx(q(i)-k)  *w 
do  120  i=i+l,n 
t(j.i)=t(i.i) 

120  t(i,j)~tau*t(j.i)-f(i.j)*w 

return 

end 


subroutine  swap(m,n.f,z,q.k,x) 
complex  f(m,n).z(n),ullIul2.u21,x 

c  This  subroutine  swap  the  |k|-th  and  |k|-l-th  diagonal  of  input  matrix  f. 
c  If  input  k  is  positive,  than  we  perform  UH FU.  else 
c  the  inverse  transformation  UFUH ,  where  U  is  defined  as 
c  in  section  7.2. 

integer  q(n) 

u21=cmplx(le0/sqrt(le0+real(conjg(x)*x))) 

ull=x*u21 

ul2=u21 

Sx.eq.(0e0,0e0))  goto  5 
k.gt.0)  then 
u  12=-u  1  l/conjg(x) 
else 
k=-k 

u21=-ul  l/conjg(x) 
endif 

5  continue 

do  10  i=l,k-2 

x  =f(i,k-l)*ull+f(i,k)*u21 
f(i.k)  =f(i.k-l)*ul2+f(i,k)*ull 
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10  f(i,k-l)=x 

ull=conjg(ull} 
u21=conjg(u21) 
ul2=conjg(ul2) 
do  20  j=k+l,n 

x  =f(k-l,j)*ull+f(k,j)*u21 
f(k,j)  =f(k-l.j)*ul2+f(k.j)*ull 
20  f(k-l,j)=x 

q(k-l)=i 
x  =z(k) 
z(k)  =z(k-l) 
z(k-l)=x 
return 
end 


subroutine  modi  (z.x.k) 
complex  x,z 
integer  k 

m 

w=  user  given 

m 


. User  provided . 

Given  argument  x,  return  the  reduced  argument  z  so  that 
|z|=|x-kwf  is  minimized  over  all  integers  k. 


return 

end 


/ 
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